mravanelli / pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
2.37k stars 446 forks source link

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 5658 and the array at index 1 has size 5640 #234

Open aziryasin opened 4 years ago

aziryasin commented 4 years ago

I'm getting this error. I checked the for path errors like this . But getting this error. Please help.

My CFG: `[cfg_proto] cfg_proto = proto/global.proto cfg_proto_chunk = proto/global_chunk.proto

[exp] cmd = run_nn_script = run_nn out_folder = exp/Sinhala_num seed = 1234 use_cuda = False multi_gpu = False save_gpumem = False n_epochs_tr = 24

[dataset1] data_name = Sinhala_num_tr fea = fea_name=mfcc fea_lst=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/data/train/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/data/train/utt2spk ark:/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/mfcc/cmvn_train.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | cw_left=5 cw_right=5

lab = lab_name=lab_cd lab_folder=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/exp/tri1 lab_opts=ali-to-pdf lab_count_file=auto lab_data_folder=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/data/train/ lab_graph=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/exp/tri1/graph

n_chunks = 5

[dataset2] data_name = Sinhala_num_dev fea = fea_name=mfcc fea_lst=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/data/test/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/data/test/utt2spk ark:/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | cw_left=5 cw_right=5

lab = lab_name=lab_cd lab_folder=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/exp/tri1 lab_opts=ali-to-pdf lab_count_file=auto lab_data_folder=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/data/test/ lab_graph=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/exp/tri1/graph

n_chunks = 1

[dataset3] data_name = Sinhala_num_test fea = fea_name=mfcc fea_lst=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/data/test/feats.scp fea_opts=apply-cmvn --utt2spk=ark:/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/data/test/utt2spk ark:/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/mfcc/cmvn_test.ark ark:- ark:- | add-deltas --delta-order=2 ark:- ark:- | cw_left=5 cw_right=5

lab = lab_name=lab_cd lab_folder=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/exp/tri1 lab_opts=ali-to-pdf lab_count_file=auto lab_data_folder=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/data/test/ lab_graph=/home/azir/kaldi/egs/SinhalaSpeechRecognizer/HMM_Based_4/exp/tri1/graph

n_chunks = 1

[data_use] train_with = Sinhala_num_tr valid_with = Sinhala_num_dev forward_with = Sinhala_num_test

[batches] batch_size_train = 128 max_seq_length_train = 1000 increase_seq_length_train = False start_seq_len_train = 100 multply_factor_seq_len_train = 2 batch_size_valid = 128 max_seq_length_valid = 1000

[architecture1] arch_name = MLP_layers1 arch_proto = proto/MLP.proto arch_library = neural_networks arch_class = MLP arch_pretrain_file = none arch_freeze = False arch_seq_model = False dnn_lay = 1024,1024,1024,1024,N_out_lab_cd dnn_drop = 0.15,0.15,0.15,0.15,0.0 dnn_use_laynorm_inp = False dnn_use_batchnorm_inp = False dnn_use_batchnorm = True,True,True,True,False dnn_use_laynorm = False,False,False,False,False dnn_act = relu,relu,relu,relu,softmax arch_lr = 0.08 arch_halving_factor = 0.5 arch_improvement_threshold = 0.001 arch_opt = sgd opt_momentum = 0.0 opt_weight_decay = 0.0 opt_dampening = 0.0 opt_nesterov = False

[model] model_proto = proto/model.proto model = out_dnn1=compute(MLP_layers1,mfcc) loss_final=cost_nll(out_dnn1,lab_cd) err_final=cost_err(out_dnn1,lab_cd)

[forward] forward_out = out_dnn1 normalize_posteriors = True normalize_with_counts_from = lab_cd save_out_file = False require_decoding = True

[decoding] decoding_script_folder = kaldi_decoding_scripts/ decoding_script = decode_dnn.sh decoding_proto = proto/decoding.proto min_active = 200 max_active = 7000 max_mem = 50000000 beam = 13.0 latbeam = 8.0 acwt = 0.2 max_arcs = -1 skip_scoring = false scoring_script = local/score.sh scoring_opts = "--min-lmwt 1 --max-lmwt 10" norm_vars = False `

image

Godaddy-xie commented 4 years ago

You can modify codes in data_io.py 50: if k in fea -----> if k in fea and and v.shape[0] == fea[k].shape[0- 53: k: v for k, v in fea.items() if k in lab ---> k: v for k, v in fea.items() if k in lab and v.shape[0] = lab[k].shape[0] to ensure that the feature and the truth value have the same size (frames).

anketvit commented 2 years ago

Having same issue, not resolved