harvard-edge / multilingual_kws

Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
155 stars 35 forks source link

found duplicates at __2 and __3 etc #20

Closed mmaz closed 2 years ago

mmaz commented 2 years ago

v1 fix: delete these? estimate percentage (and whether this affects splits) - if too painful, then re-extract

mmaz commented 2 years ago

we estimated ~4% of english clips are suffixed with __ (cc @keithachorn-intel)

mmaz commented 2 years ago

preserved __2 files, expunged other files for the v1 release