Closed minhtuev closed 1 month ago
The changes enhance the indexing functionality in the FiftyOne codebase by adding error handling and logging to manage concurrent index operations. A new configuration parameter allows users to set limits on active index builds, improving performance and resource management. This update ensures users receive clear feedback on index creation status, making the system more robust and user-friendly.
Files | Change Summary |
---|---|
fiftyone/core/collections.py |
Enhanced create_index method with error handling and logging for concurrent indexing operations. |
tests/unittests/dataset_tests.py |
Introduced a test to validate behavior when the maximum concurrent index limit is reached. |
🐇 In the fields where data plays,
New indexes bloom on sunny days.
With logs and limits to guide the way,
Our work is smoother, come what may!
Hoppity hop, let the builds commence,
In a world of order, we make perfect sense! 🌼📈
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
Example:
In [7]: dataset.create_index("negative_labels.classifications.confidence")
RuntimeError Traceback (most recent call last)
<ipython-input-7-ebe634beef82> in ?()
----> 1 dataset.create_index("negative_labels.classifications.confidence")
~/workspace/fiftyone/fiftyone/core/collections.py in ?(self, field_or_spec, unique, **kwargs)
9172 num_in_progress = len(
9173 [index for index in ii.values() if index.get("in_progress")]
9174 )
9175 if num_in_progress >= fo.config.max_indexes_in_progress:
-> 9176 raise RuntimeError(
9177 "Too many indexes are currently being built; "
9178 "please try again later."
9179 )
RuntimeError: Too many indexes are currently being built; please try again later.
In [8]: dataset.stats(include_indexes=True)
Out[8]:
{'samples_count': 10000000,
'samples_bytes': 156539929160,
'samples_size': '145.8GB',
'indexes_count': 4,
'indexes_bytes': 231845888,
'indexes_size': '221.1MB',
'indexes_in_progress': ['positive_labels.classifications.label',
'positive_labels.classifications.confidence'],
'index_bytes': {'id': 130613248,
'negative_labels.classifications.label': 101224448,
'positive_labels.classifications.label': 4096,
'positive_labels.classifications.confidence': 4096},
'index_sizes': {'id': '124.6MB',
'negative_labels.classifications.label': '96.5MB',
'positive_labels.classifications.label': '4.0KB',
'positive_labels.classifications.confidence': '4.0KB'},
'total_bytes': 156771775048,
'total_size': '146.0GB'}
@brimoor: should we create a separate branch for this change?
I like the idea that users have more freedom in the SDK compared to the UI, but index creation is an impactful operation so this is one type of guardrail that we can add. As we observe although the method is blocking for long-running index creation users can still circumvent it by breaking and re-running etc :)
@brimoor Updated the base branch to feat/index-management
which is a feature branch for the index-related changes; we can test on this branch before merging to develop.
As discussed offline, we will move the guardrail to the Index Management Panel.
What changes are proposed in this pull request?
Added a config var and a check for how many indexes that a user can create per collection
How is this patch tested? If it is not, please explain why.
Release Notes
Is this a user-facing change that should be mentioned in the release notes?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What areas of FiftyOne does this PR affect?
fiftyone
Python library changesSummary by CodeRabbit
Summary by CodeRabbit