voxel51 / fiftyone

Refine high-quality datasets and visual AI models
https://fiftyone.ai
Apache License 2.0
8.73k stars 551 forks source link

[FR] Avoid uploading data if samples exist in CVAT connected fileshare #1235

Open ehofesmann opened 3 years ago

ehofesmann commented 3 years ago

CVAT allows files to be accessed either by: 1) Uploading local files (What is currently done) 2) Uploaded through remote URLs 3) Accessed directly through a mounted file share

image

If the data that is being uploaded exists in a file share connected to CVAT, then it would be preferable to not upload the data to the server. This is especially important in cases where a large number of images or videos are being annotated at one time.

Adding this should be fairly simple. It would require updating this to allow for shared files:

https://github.com/voxel51/fiftyone/blob/e43c00ab96282d1f016e4f57f806c8f48feff6bc/fiftyone/utils/cvat.py#L3096-L3102

 files = {}
 for idx, path in enumerate(paths): 
     # IMPORTANT: CVAT organizes media within a task alphabetically by 
     # filename, so we must give CVAT filenames whose alphabetical order 
     # matches the order of `paths` 
     filename = "%06d%s" % (idx, os.path.splitext(path)[1]) 
     if use_fileshare:
         data["server_files[%d]" % idx] = (filename, path)
     else:
         files["client_files[%d]" % idx] = (filename, open(path, "rb")) 

In order to use the correct path, it would be straightforward to follow the Alternate media workflow and store the file share path to every sample as a field on the FiftyOne dataset.

Other points to consider are the options to copy_data and use_cache that will likely need to be incorporated to avoid copying data even for media in the file share. https://github.com/openvinotoolkit/cvat/pull/3544

Huy2122k commented 2 years ago

@ehofesmann I think CVAT has new sort feature in a task in PR: #https://github.com/opencv/cvat/pull/3937

We can use sorting method: Predefined with server_files and storage = "share", storage_method ="cache" to avoid any uploading or copy file.

But I found a bug (maybe) when using sort: Predefined: the files order in task created was reversed ... (so confused?) image

Finally

We can simple modify funtion upload_data in utils/cvat.py like:

image

and shared_path is path to the shared folder contain images.

Its work for me !

Hope it helps anyone who has problems uploading large quantities of images.

thiagoribeirodamotta commented 3 months ago

Was this ever integrated to the main branch?