mycrazycracy / tf-kaldi-speaker

Neural speaker recognition/verification system based on Kaldi and Tensorflow
Apache License 2.0
32 stars 16 forks source link

About chunk size #4

Closed zxynbnb closed 4 years ago

zxynbnb commented 5 years ago

When I extracted embeddings (stage=8), I encountered a problem. When the length larger than the chunk size, it will be fall into a stop. In order to continue the extracting, I have to set the chunk size bigger to avoid segmentation. So is this a bug? And how can I deal with it?

mycrazycracy commented 5 years ago

It is strange because it is not expected to happen. When the utterance is larger than the chunk size, it will be split into multiple segments and embedding will be extracted from each segment, then averaged.

For me, this mechanism works fine in SRE (many utterances larger than 2min). If you go through the extraction script, you can find the procedure.

What did you encounter when you say "it will fall into a stop"? Any log output? Or do you have any information about the error, e.g. at which step it stops?

zxynbnb commented 5 years ago

Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from /home/zhangxingyu/tfsitw/exp/xvector_nnet_tdnn_amsoftmax_m0.20_linear_bn_1e-2_tdnn4_att/nnet/model-150000 INFO:tensorflow:Succeed to load checkpoint model-150000 INFO:tensorflow:[INFO] Key 00165_yjeod length 4452. INFO:tensorflow:[INFO] Key 00172_kmjtv length 19976. INFO:tensorflow:[INFO] Key 00197_qhtho length 70210 > 50000, split to 2 segments.

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

The information above is the log when length larger than chunk size. And the process fall into a state similar to an infinite loop. The system will stop the process of extracting embeddings. But when I set the chunk size bigger, the problem will not appear. Is this a problem about the version of tensorflow?

mycrazycracy commented 5 years ago

The log seems alright. Interesting. I never encounter this problem. And I don't think it is due to the tf version. The best way to fix this is to locate which step causes the problem in extract.py

If you look at nnet/lib/extract.py, you can find that it simply split the features and concatenate (average) the x-vector. You can use pdb (do not use "$cmd JOB=1:$nj ${dir}/log/extract.JOB.log" if you want to use pdb) and set trace in extract.py to see what happened. Also, you can use "print" to find which step is wrong.