visual-layer / fastdup

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.
Other
1.59k stars 77 forks source link

Use fastdup in code pipeline rather than reports #320

Closed UsmanAliKhan60 closed 6 months ago

UsmanAliKhan60 commented 6 months ago

Feature Name

Using in pipeline

Feature Description

[Feature Request]: Hello, I am trying to use fastdup to check quality of webcam. I want to detect face, lightness or darkness and blurriness. Based on these features I want to check if webcam is good or not. Currently fastdup takes dataset folder as input and save the output of blurriness in the form of report in .html format. Is there any method to pass the image to fastdup to get the response in the code rather than saving in the report?

Contact Information [Optional]

No response

dbickson commented 6 months ago

HI @UsmanAliKhan60, your request is already supported. Fastdup creates output files, for example fd.img_stats() https://visual-layer.readme.io/docs/v1-api#fastdup.fastdup_controller.FastdupController.img_stats returns a dataframe with the image statistics, you can filter blurry images from the blur values computed.

For example:

import fastdup
fastdup.run('/my/data/", work_dir="out")
#delete 5% of the brightest images and delete 2% of the darkest images
fastdup.delete_or_retag_stats_outliers("out", metric="mean", lower_percentile=0.05, dry_run=False)

Let us know if you need any further help.

UsmanAliKhan60 commented 6 months ago

Thanks. I am able to get the results in code using fd.img_stats(). But the results are also getting saved in work_dir folder. I dont want to create the work_dir folder only use the results in the code. Can I do this?

dbickson commented 6 months ago

Hi @UsmanAliKhan60 fastdup need some scratch space to put intermediate files. You can safely delete this directory once you are done with the computation. Best,