thuhcsi / NeuFA

Neural network-based forced alignment with bidirectional attention mechanism
70 stars 8 forks source link

Bad results when inference #3

Closed Liujingxiu23 closed 2 years ago

Liujingxiu23 commented 2 years ago

Hi, thank you for your work and sharing! I tried the model, the model structure is the original version, except that I used 80-dim mels with hop_size=256 without normalization instead of mfcc. The loss seems good. But the inference results are bad, the result for all phones is [0.01, 0.01]. I checked the "w1" in inference results, it is good, it is diagonal. The "boundary" result in inference seems: image Dose the loss good? image image I did not know where is wrong

Liujingxiu23 commented 2 years ago

I find that I only do "pretrain" without "boundary loss" related. What should I do if I do not have any "phoneme/word bounday" of any of my data, and can not do dev/semi/semi2 training?

petronny commented 2 years ago

Well, to use NeuFA, you have to use some data with boundaries to train the boundary detector.

For Chinese, you can use the Chinese Standard Mandarin Speech Copus from Databaker.

Liujingxiu23 commented 2 years ago

Thank you for your reply, I got it.