Closed shantanusingh16 closed 2 months ago
This is a valid concern. Thanks for reporting!
@shantanusingh16 we've released fastdup==2.5
which addressed this issue. Would you please update fastdup and see if this is still an issue?
Hey @dnth . I was able to verify that this problem is solved with fastdup==2.5
. Thank you for the prompt fix!
Thanks for confirming again! I will close this issue.
What happened?
When trying to run fastdup on a dataset, it ends up copying all these images to specific sub-directories inside a 'cdn' directory inside the work-dir specified. This becomes a challenge with disk storage and also a bottleneck when dealing with network volumes that have slow read/write speeds.
What did you expect to see?
Expected fastdup to not create copies of all images inside work-dir.
What version of fastdup were you runnning on?
2.3
What version of Python were you running on?
Python 3.10
Operating System
Ubuntu 22.04
Reproduction steps
Download an image dataset.
Run fastdup on this dataset using the command:
Navigate to the directory work_dir/cdn. This would contain subdirectories where all the images have been copied.
Relevant log output
No response
Attach a screenshot [Optional]
No response
Contact Details [Optional]
shantanusingh10@gmail.com