voxel51 / fiftyone

The open-source tool for building high-quality datasets and computer vision models
https://fiftyone.ai
Apache License 2.0
8.11k stars 540 forks source link

DocumentTooLarge: 'aggregate' command document too large #3571

Open pawani2v opened 11 months ago

pawani2v commented 11 months ago

System information

Describe the problem

I have a dataset with 215840 images, when importing annotations from cvat (fouc.import_annotations) i get following error:

---------------------------------------------------------------------------
DocumentTooLarge                          Traceback (most recent call last)
[1.cvat_2_fiftyone_dataset.ipynb) Cell 11 line 1

File [~/.virtualenvs/flash/lib/python3.8/site-packages/fiftyone/utils/cvat.py:266](https://file+.vscode-resource.vscode-cdn.net/media/drive_8tb/Development/deep_learning/projects/ai-model-trainings/data_cleaning/detection-data-cleaning/~/.virtualenvs/flash/lib/python3.8/site-packages/fiftyone/utils/cvat.py:266), in import_annotations(sample_collection, project_name, project_id, task_ids, data_path, label_types, insert_new, download_media, num_workers, occluded_attr, group_id_attr, backend, **kwargs)
    259         for task_id in task_ids:
    260             label_schema = api._get_label_schema(
    261                 task_id=task_id,
    262                 occluded_attr=occluded_attr,
    263                 group_id_attr=group_id_attr,
    264             )
--> 266             _download_annotations(
    267                 dataset,
    268                 [task_id],
    269                 cvat_id_map,
    270                 label_schema,
    271                 label_types,
    272                 anno_backend,
    273                 anno_key,
    274                 **kwargs,
    275             )
    276 finally:
    277     anno_backend.delete_run(dataset, anno_key)

File [~/.virtualenvs/flash/lib/python3.8/site-packages/fiftyone/utils/cvat.py:395](https://file+.vscode-resource.vscode-cdn.net/media/drive_8tb/Development/deep_learning/projects/ai-model-trainings/data_cleaning/detection-data-cleaning/~/.virtualenvs/flash/lib/python3.8/site-packages/fiftyone/utils/cvat.py:395), in _download_annotations(dataset, task_ids, cvat_id_map, label_schema, label_types, anno_backend, anno_key, **kwargs)
    393 project_ids = []
    394 job_ids = []
--> 395 frame_id_map = {
    396     task_id: _build_sparse_frame_id_map(dataset, cvat_id_map[task_id])
    397     for task_id in task_ids
...
   1032     # There's nothing intelligent we can say
   1033     # about size for update and delete
-> 1034     raise DocumentTooLarge(f"{operation!r} command document too large")

DocumentTooLarge: 'aggregate' command document too large

code used to import annotations:

fouc.import_annotations(
        dataset,
        task_ids=[51,52,54,55,56,64,65],  
        data_path=data_map,
        download_media=False,
    )

What areas of FiftyOne does this bug affect?

pawani2v commented 11 months ago

Same issue with a smaller dataset having 1,25,277 images.

benjaminpkane commented 11 months ago

Hi @pawani2v, your videos/image may have too much metadata for importation. Importing tasks in smaller batches may solver your issue