Production mode save_outputs: getting bad .ark files

mravanelli / pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

2.37k stars 446 forks source link

Hello, I'm currently using the Librispeech dataset and have trained a model following the pytorch-kaldi tutorial. I'm trying to use this trained librispeech acoustic model to produce embeddings for a speech conversion task. To do this, I have created a separate cfg which I use to enter production mode. I feed in the new features for my speech conversion data and am saving the outputs of out_dnn1 (which is the last layer before output layer and what I am trying to use as embeddings). I am able to run the pytorch-kaldi production script successfully however the .ark files produced for out_dnn1 seem to be buggy. Running "Copy-feats" gives me an error after the first key. Error is below:

WARNING (copy-feats[5.5.671~1494-e5a5a]:Next():util/kaldi-table-inl.h:562) Invalid archive file format: expected space after key ��M>
ERROR (copy-feats[5.5.671~1494-e5a5a]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive forward_dst_te_ep16_ck0_out_dnn1.ark

Attached is the config file being used for production and log.log log.txt libri_RNN_production.txt

Thanks!

mravanelli / pytorch-kaldi

Production mode save_outputs: getting bad .ark files #228