kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.11k stars 5.31k forks source link

About the implementation of toolkits to retrieve any time-aligned information typically use in research and industry #4923

Closed a2d8a4v closed 3 days ago

a2d8a4v commented 2 months ago

Hi,

Is it advisable to implement useful toolkits to retrieve time-aligned information, such as suprasegmental, segmental information, or any features stemming from alignment information?

For instance, fac-via-ppg implements the process of obtaining phonetic posteriorgrams via the PyKaldi API, which can be converted to C++ code and ultimately becomes compute-ppg. However, I am concerned that these toolkits may not be compatible with existing Kaldi recipes. I would like to know if it is better to merge these codes into Kaldi or create an independent repository.

Best regards,

kkm000 commented 2 weeks ago

I'm not really sure that I copy you, but since you can use the Kaldi API exposed via Python wrapper, you can also link directly against the Kaldi libraries required for your project. Generally, every directory with code whose name does not end in bin contains code for a library. Header files are in the same directories as the source files. This is how we productize Kaldi-based speech recognition solutions. You may build either static (.a) or dynamic (.so) libraries. In the latter case, don't forget the OpenFST libraries, built under tools/openfst.

Check the Makefile for libraries interdependency information. Generally, training-time and inference-time libraries are separate, but there is some lumping in the nnet2 and nnet3 libraries.

Is this something that you want to achieve?

a2d8a4v commented 1 week ago

Hi @kkm000,

I mean, I'm not sure if it's a good idea to implement such toolkits in Kaldi, given that Kaldi is specifically designed for automatic speech recognition, keyword spotting, and related tasks. I'm concerned that adding toolkits to retrieve time-aligned information, such as suprasegmental and segmental data, might conflict with the original purpose of Kaldi.

Sincerely,

kkm000 commented 1 week ago

Ah, got you! This will not be a problem. Kaldi is a much more general seq2seq toolkit than just an ASR engine.

kkm000 commented 3 days ago

I am closing this issue for now. If you believe that your issue has not been addressed, please feel free to ping me, and I'll reopen it. @-mention me for a faster response!