nttcslab-sp / kaldiio

A pure python module for reading and writing kaldi ark files
Other
248 stars 35 forks source link

Reading models #48

Open jvel07 opened 4 years ago

jvel07 commented 4 years ago

Hi, this is one of the best tools for reading ark and scp files! However, wanted to know how can one read models generated by kaldi, e.g. fullUBM models or ivector extractor models, which are usually generated by kaldi with 'ubm' extensions. For example: final.ubm, or final.dubm.

kamo-naoyuki commented 4 years ago

Sorry, basically, this tool is aimed at just loading matrix binary files.

Actually, ivector-model is not implemented in python, then, what do you mean loading model file? If you mean that just parsing the parameters in the model file into numpy object or float array, then it's not impossible, but, even if so, we may not support it.

jvel07 commented 4 years ago

Thanks for your reply, @kamo-naoyuki . Yes, I meant loading them into a numpy, I intend to use the models generated for a specific experiment in computational paralinguistics. Now I understand kaldiio is not supporting this. So, in the case it's not impossible, are you aware of the way one can read this? or maybe point out someone or some blog that can help, perhaps? Would be really helpful since I am searching for this for several days already. Sorry for the inconvenience.

kamo-naoyuki commented 4 years ago

How about pykaldi?https://pykaldi.github.io/api/kaldi.ivector.html#kaldi.ivector.IvectorExtractor

Direct way is also not hard. If you don't need the python API to read the model, how about directly using kaldi library in c++ and write parameters to your desired format? Kaldi's code is really simple, so it's not hard to understand to read the source code.

https://github.com/kaldi-asr/kaldi/blob/a2573871ba185b8fd83ec5e66270a9a2301e4300/src/ivector/ivector-extractor.cc#L710-L730

It's also not hard to translate this code to python, if you are familiar to both language.

kamo-naoyuki commented 4 years ago

Sorry, I mistakenly pointed, here is ivector-extractor read: https://github.com/kaldi-asr/kaldi/blob/a2573871ba185b8fd83ec5e66270a9a2301e4300/src/ivector/ivector-extractor.cc#L828-L849

jvel07 commented 4 years ago

Thank you so much, @kamo-naoyuki. I will try this out and see. :)