Error in eigh of decomp.py

Jul1999 commented 3 years ago

Hi, Thank you for your paper and repos. I think this is an astonish innovation for speech verification. I am an undergraduate student trying to train the model with our mother langue dataset. But we have an error and cannot figure it out why. Could you guys have look at the errors and have some suggestion how to fix it for us?

Traceback (most recent call last): File "expe.py", line 196, in trn_loss = model.init_params_with_data(trn_dataset, config_trn, device=device, subset=init_subset) File "/home/thule/Project/voice-verification/DCA-PLDA-master/examples/modules.py", line 183, in init_params_with_data self.lda_stage.init_with_lda(x, speaker_ids, init_params, sec_ids=domain_ids) File "/home/thule/Project/voice-verification/DCA-PLDA-master/examples/modules.py", line 598, in init_with_lda evals, evecs = linalg.eigh(BCov, WCov) File "/home/thule/anaconda3/envs/Hope/lib/python3.8/site-packages/scipy/linalg/decomp.py", line 581, in eigh raise LinAlgError('The leading minor of order {} of B is not ' numpy.linalg.LinAlgError: The leading minor of order 20 of B is not positive definite. The factorization of B could not be completed and no eigenvalues or eigenvectors were computed.

Our info in eigh function was 532 when it's supposed to be 0. For x vector, we tried both MFCC and PNCC, but we had the same errors. Our xvector was ways smaller than yours though (about a hundredth of yours).

We are really lost. We really appreciate your helps. Thank you for your time.

luferrer commented 3 years ago

I am glad you find the repository useful.

That error appears when your LDA dimension is larger than the number of speakers you have in the dataset. Try reducing the lda_dim in the config to a number lower than the number of speakers you have for training. Let me know if that works.

Jul1999 commented 3 years ago

Hi Luciana, Thank you for your quick response. We changed the lda_dim from 300 to 20, which was really smaller than the number of speakers that we have, but it didn't help. So we tried to replicate our data to increase the number of speakers to 560, because we suspect that the number may have to be bigger than the dimension of the xvector, but we have the same error too. When we tried to debug, we found out WCov was not positive definite, maybe we still need more speakers? We are trying to convert the vox celeb dataset to xvector and train again with our data, but it takes a while, so I just decide to reply to you about our progress. Do you think we need to try something else to fix it? I really appreciate your response.

luferrer commented 3 years ago

Mh, that sounds like an issue with your data. How many samples per speaker do you have?

Jul1999 commented 3 years ago

We have around 40 samples each. But we only have 1 session though.

bsxfan commented 3 years ago

Lu

I'm not sure if it's relevant here, but sometimes it helps to symmetrize your matrices, using (C+C.T)/2.

Niko

On Tue, 6 Apr 2021 at 18:53, Jul1999 @.***> wrote:

We have around 40 samples each. But we only have 1 session though.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/luferrer/DCA-PLDA/issues/2#issuecomment-814275116, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7ZFDMKXSYWQ3WBTAG4HFLTHM4CHANCNFSM42IUJTVQ .

luferrer commented 3 years ago

Niko, I believe the matrices are already symmetric given they way they are computed. Do you mean that perhaps there are numerical issues that make them non-symmetric?

In any case, if WCov is not positive definite then I think it has to be an issue with the data. Are those 40 samples from the same session different enough from each other? Perhaps it is a numerical issue, but the fact that it happens is telling you something about the data.

Finally, I do not think you will be able to get a good model (even if initialization worked fine, which is where it is crashing now) with 20 speakers from a single session each. Adding vox data should help, though. You should make sure to use different domain labels for vox and your data so that the model can still learn from your domain which has much less data.

bsxfan commented 3 years ago

Niko, I believe the matrices are already symmetric given they way they are computed. Do you mean that perhaps there are numerical issues that make them non-symmetric?

Yes. Properly derived covariance matrices are always symmetric, given infinite precision. But numerical effects can make them not so.

Finally, I do not think you will be able to get a good model (even if initialization worked fine, which is where it is crashing now) with 20 speakers from a single session each. Adding vox data should help, though. You should make sure to use different domain labels for vox and your data so that the model can still learn from your domain which has much less data.

Yes, trying to train a speaker recognizer with just 20 speakers is not going to work, unless you go seriously Bayesian.

Niko

Jul1999 commented 3 years ago

Hi, I cannot describe how much I appreciate you guys help. I fixed that issue, after I add more session and change the ldadim. I will definitely combine the vox data with our to train the model. Thank you again.

luferrer commented 3 years ago

Nice! I am glad that worked. Good luck with your research.

Luciana

luferrer / DCA-PLDA

Error in eigh of decomp.py #2