wenet-e2e / wekws

Production First and Production Ready End-to-End Keyword Spotting Toolkit
Apache License 2.0
457 stars 111 forks source link

How to prepare dataset for RIR & Musan Augmentation #116

Closed csetanmayjain closed 1 year ago

csetanmayjain commented 1 year ago

Hi, Would like know how to prepare dataset for RIR & Musan Augmentation I go through the script, and understand that it needs data in .mdb format that should be inside lmdb folder. I have raw audio files, how to prepare data for it? Also, would like to know, is there any flag in the configuration file, which I can use as a flag to apply augmentation or not.

Thanks

robin1001 commented 1 year ago

please see https://github.com/wenet-e2e/wekws/blob/main/tools/make_lmdb.py

mlxu995 commented 1 year ago

Just need preparing a wav.scp file and specify the path to save lmdb file. Then run "python make_lmdb.py /path/to/wav.scp /path/to/lmdb_dir" to prepare the lmdb file. You can turn off augmentation by setting the corresponding probability to 0, or simply do not pass the corresponding lmdb file in run.sh

csetanmayjain commented 1 year ago

Thanks