microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.51k stars 4.29k forks source link

Petko #2149

Closed pnpetkov closed 7 years ago

pnpetkov commented 7 years ago

I need to interface CNTK (Python) with Kaldi. Unfortunately there is no dedicated reader (such as HTKFeatureDeserializer). Can you suggest a shortcut that does not involve writing my own reader. E.g., use one of the existing readers and modifying the input files as needed.

Thank you!

eldakms commented 7 years ago

The only thing that comes to mind - convert Kaldi format to some supported one (i.e. to HTKand then use HTKFeatureDeserializer). I am not sure whether something like that exists in HTK toolkit.

If you go with implementing your own deserializer, probably implementing it in python will be the simplest.(have a look at /Manuals/Manual_How_to_write_a_custom_deserializer.ipynb)

The team will discuss whether we will provide a built-in deserializer for Kaldi, but definitely not in this sprint.

I am closing this for now. Thanks!

pnpetkov commented 7 years ago

I now have a working solution based on mapping Kaldi's mlf files to HTK format and then using the HTK deserializers in CNTK. Interfacing the Kaldi decoder in the end was not a problem either, once the feature ARKs were generated. For that I based myself on the Kaldi_io.py available here: https://github.com/vesis84/kaldi-io-for-python.