visual-layer / fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Other
1.52k stars 74 forks source link

[Feature Request]: num_images ordering #242

Closed guy-singer closed 11 months ago

guy-singer commented 11 months ago

Feature Name

Num_images ordering

Feature Description

Num_images likely returns the firs N images in the dataset, but nowhere in the documentation is this detailed. Two possible directions to improve:

  1. Simple: specify that num_images returns the first N examples

  2. Less simple: allow for selection patterns such as random sampling

Contact Information [Optional]

No response

dbickson commented 11 months ago

input_dir supports a python list of filenames, thus you can do the sampling in whatever policy you like and then send the list as an argument to fastdup that will work on those files.