visual-layer / fastdup

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.
Other
1.59k stars 77 forks source link

[Bug]: Fastdup will create a copy of all images in the 'cdn' folder inside work_dir. #334

Closed shantanusingh16 closed 2 months ago

shantanusingh16 commented 3 months ago

What happened?

When trying to run fastdup on a dataset, it ends up copying all these images to specific sub-directories inside a 'cdn' directory inside the work-dir specified. This becomes a challenge with disk storage and also a bottleneck when dealing with network volumes that have slow read/write speeds.

What did you expect to see?

Expected fastdup to not create copies of all images inside work-dir.

What version of fastdup were you runnning on?

2.3

What version of Python were you running on?

Python 3.10

Operating System

Ubuntu 22.04

Reproduction steps

  1. Download an image dataset.

  2. Run fastdup on this dataset using the command:

    fd = fastdup.create(input_dir=f"{data_dir}/images/", work_dir=f"{data_dir}/work_dir")
    fd.run()
  3. Navigate to the directory work_dir/cdn. This would contain subdirectories where all the images have been copied.

Relevant log output

No response

Attach a screenshot [Optional]

No response

Contact Details [Optional]

shantanusingh10@gmail.com

dnth commented 3 months ago

This is a valid concern. Thanks for reporting!

dnth commented 3 months ago

@shantanusingh16 we've released fastdup==2.5 which addressed this issue. Would you please update fastdup and see if this is still an issue?

shantanusingh16 commented 2 months ago

Hey @dnth . I was able to verify that this problem is solved with fastdup==2.5. Thank you for the prompt fix!

dnth commented 2 months ago

Thanks for confirming again! I will close this issue.