voxel51 / fiftyone

The open-source tool for building high-quality datasets and computer vision models
https://fiftyone.ai
Apache License 2.0
7.93k stars 521 forks source link

Support loading annotations for large CVAT tasks with many jobs #4392

Closed ehofesmann closed 2 months ago

ehofesmann commented 2 months ago

What changes are proposed in this pull request?

Optimized loading annotations from the CVAT backend. Annotations are now loaded from individual jobs instead of entire tasks which allows for importing annotations from much larger task sizes. There is one task in the internal CVAT deployment with 10k samples, in 200 jobs of 50 samples. Previously, trying to load this task would make a single request to the CVAT server to load all annotations from the task at once, this crashes the CVAT server. Now, annotations from each job are loaded sequentially which resolves this problem.

How is this patch tested? If it is not, please explain why.

Unit tests pass:

export FIFTYONE_CVAT_URL=...
export FIFTYONE_CVAT_USERNAME=...
export FIFTYONE_CVAT_PASSWORD=...

pytest /path/to/fiftyone/tests/intensive/cvat_tests.py

Also task 159 on the internal CVAT test deployment containing bdd100k-validation now imports properly. It is recommended you have bdd100k validation images available locally on disk as it makes this easier:

import fiftyone as fo
import fiftyone.utils.cvat as fouc
import os

cvat_url = "..."
cvat_username = "..."
cvat_password = "..."

bdd_path = "/path/to/bdd100k-validation/"
filepaths = os.list_dir(bdd_path)
data_map = {fp: os.path.join(bdd_path, fp) for fp in fps}

dataset = fo.Dataset()
# WARNING: Only run this on this branch, this will crash the CVAT deployment if run on `develop`
fouc.import_annotations(dataset, task_ids=[159], data_path=data_map, url=cvat_url, username=cvat_username, password=cvat_password)

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

Optimized loading annotations from the CVAT backend. Annotations are now loaded from individual jobs instead of entire tasks which allows for importing annotations from much larger task sizes.

What areas of FiftyOne does this PR affect?

Summary by CodeRabbit

coderabbitai[bot] commented 2 months ago

Walkthrough

The recent updates enhance the CVAT class by introducing a method to generate URLs for job annotations and refining the annotation download process. Additionally, the test suite for detection labeling has been updated by adjusting the segment_size parameter, ensuring more precise unit testing.

Changes

File Path Change Summary
fiftyone/utils/cvat.py Added job_annotation_url, modified download_annotations, added _get_job_ids in CVAT class.
tests/intensive/... Updated test_detection_labelling method by adding a segment_size parameter.

🐇✨ In the realm of code, where logic is king, A rabbit hopped in, making changes with a swing. URLs for jobs, a tweak in the test, Now everything runs just at its best. Hop, hop, hooray, let the data flow, With every line, our project will grow! 🌱🚀 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share - [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)
Tips ### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit .` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai generate interesting stats about this repository and render them as a table.` - `@coderabbitai show all the console.log statements in this repository.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (invoked as PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger a review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai help` to get help. Additionally, you can add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. ### CodeRabbit Configration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](https://discord.com/invite/GsXnASn26c) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.
ehofesmann commented 2 months ago

Implementation LGTM 💪

@ehofesmann if you retarget this at release/v0.24.0 we can include in the release this week 🤓

Thanks @brimoor ! Just getting back to this now, I assume I missed the window on this. I do still need to get it into teams too. It's OK if it doesn't make it into v0.24.0.

@benjaminpkane I see you changed the base back to develop, is it good to merge into there? If so, can I get a rereview?