Fixing pre-processing script for google speech datasets

microsoft / EdgeML

This repository provides code for machine learning algorithms for edge devices developed at Microsoft Research India.

Other

1.58k stars 369 forks source link

Fixing pre-processing script for google speech datasets #104

Closed adityakusupati closed 5 years ago

adityakusupati commented 5 years ago

@metastableB , in the process_google.py file, I think the right way to create multiple versions of the datasets with different classes should be based on the classes listed in the labelmap rather than including everything and combining the not-required classes to 0. I think this should be chaged to support, say, simple things like google-30 where the classes are only the 30 keywords and no noise.

https://github.com/microsoft/EdgeML/blob/pytorch/pytorch/examples/SRNN/process_google.py#L35.

metastableB commented 5 years ago

@adityakusupati you are taking care of this right?

adityakusupati commented 5 years ago

@metastableB, I didn't get time yet. I will fix this today or otherwise, I think I will leave it as is.