Open niclaswue opened 3 years ago
My workaround for now is to create and push to the repository when the task is finished. For this, I used the following endpoints:
cvat = CVATAnnotationAPI(...)
task_id = ... # get from dataset.load_annotation_results(anno_key).get_status()
file_path = f"labels/{anno_key}.xml"
repo_patch = { "path": f"{dataset_repository} [{file_path}]", "lfs": False } # format seems to be ignored
response_create = cvat.patch(f"{cvat.base_url}/git/repository/create/{task_id}", json=repo_patch)
time.sleep(10) # give cvat some time to clone
response_push = cvat.get(f"{cvat.base_url}/git/repository/push/{task_id}")
assert response_create.status_code == 200
assert response_push.status_code == 200
@niclaswue I have made a proof on concept for this FR, however, it seems that a better workflow would be to avoid using the git repository altogether and instead back up annotations within FiftyOne directly.
dataset2 = dataset.clone()
job_assignees
, task_assignee
, and job_reviewers
can be used to specify those parameters programmatically. You can then get the status of an annotation run containing assignee and reviewer information that you can then store back in the dataset in whatever way you want.Combining these two workflows should let you avoid needing to upload to a dataset repository.
Hey @niclaswue it sounds like you might want to hear more about FiftyOne Teams.
That's our mechanism for providing features like versioning and permissions for production ML workflows :)
Thank you very much, I found it very convenient to pass the repository info to CVAT and not deal with a git library for pushing manually. However, when adding additional information in fiftyone, this information is of course lost when transferring to CVAT, so I might take a look at this in the future. Right now we are just at the very early testing stages of our pipeline. Thanks again :)
Proposal Summary
Support for dataset repositories in CVAT during task creation and import.
Motivation
What is the use case for this feature? Ideally, the repository information can be sent to CVAT when creating a task in fiftyone. When importing the labeled data back to fiftyone, the labels are automatically pushed to the specified repository using the endpoint
<cvat_host>/git/repository/push/<task_id>
Why is this use case valuable to support for FiftyOne users in general? It allows for easy backups of the labeled data, along with information about the labeler, job reviewer etc.
Why is this use case valuable to support for your project(s) or organization? We want to use the CVAT dataset repository field for synchronization and backups when a labeling task is finished. It is also important to us, to save the metadata about labeler and reviewer to assure high dataset quality.
Why is it currently difficult to achieve this use case? (please be as specific as possible about why related FiftyOne features and components are insufficient) To the best of my knowledge, this feature is not supported at the moment. I saw it's possible to set the values for
job_assignees
,job_reviewers
etc. inCVATBackendConfig
but there is no option for a dataset repository or did I overlook it?What areas of FiftyOne does this feature affect?
fiftyone
Python libraryWillingness to contribute
The FiftyOne Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?