harvard-edge / multilingual_kws

Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
155 stars 35 forks source link

some empty directories in MSWC? or the 16KHz reencode? #35

Open mmaz opened 2 years ago

mmaz commented 2 years ago
uhohs = []
mswc_16khz = Path("/media/mark/hyperion/mswc/16khz_wav/en/clips")
keywords = list(sorted(os.listdir(mswc_16khz)))
print(len(keywords))
for keyword in tqdm.tqdm(keywords):
    keyword_samples = list(sorted((mswc_16khz / keyword).glob("*.wav")))
    if len(keyword_samples) == 0:
        uhohs.append(keyword)
print(len(uhohs))
>>> 24