mravanelli / pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
2.37k stars 446 forks source link

Error with getting shared_list #220

Closed seas2nada closed 4 years ago

seas2nada commented 4 years ago

I'm trying to train Librispeech alignments in Ubuntu docker container, but keep getting empty shared_list errors.

Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, *self._kwargs) File "/home/Workspace/dh/pytorch-kaldi-dh/data_io.py", line 573, in read_lab_fea fea_scp, fea_opts, lab_folder, lab_opts, cw_left, cw_right, max_seq_length, output_folder, fea_only File "/home/Workspace/dh/pytorch-kaldi-dh/data_io.py", line 249, in load_chunk fea_scp, fea_opts, lab_folder, lab_opts, left, right, max_sequence_length, output_folder, fea_only File "/home/Workspace/dh/pytorch-kaldi-dh/data_io.py", line 208, in load_dataset fea_conc, lab_conc, end_index_fea, end_index_lab = _concatenate_features_and_labels(fea_chunks, lab_chunks) File "/home/Workspace/dh/pytorch-kaldi-dh/data_io.py", line 160, in _concatenate_features_and_labels fea_conc, lab_conc = _sort_chunks_by_length(fea_conc, lab_conc) File "/home/Workspace/dh/pytorch-kaldi-dh/data_io.py", line 149, in _sort_chunks_by_length fea_conc, lab_conc = zip(fea_sorted) ValueError: not enough values to unpack (expected 2, got 0)

Traceback (most recent call last): File "./run_exp_semi.py", line 276, in next_config_file, File "/home/Workspace/dh/pytorch-kaldi-dh/core.py", line 529, in run_nn data_name = shared_list[0] IndexError: list index out of range

I've run exactly same code in my Ubuntu PC with RTX 2080Ti & 8 CPU cores, and it worked well. However, when I try to do the same thing in my Ubuntu PC with four 2080Ti & 28 cores, it returns empty fea_conc in data_io.py At first, I thought multi-threading might occur problems, so I limited number of threads to use but it didn't work. Get rid of multi-threading didn't work either. What I found out is that it returns "None" key in data_io.read_key. Maybe the command line 'fd.read(1).decode("latin1")' has some problems with my case, but it's quite difficult for me to solve, since I do not understand what fd actually means. When I print it out, it shows me something like <_io.Bufferedreader name=4>, but I don't get what it means.

Can you give me some advice for this kind of problem? Thanks for your attention