scelesticsiva / speaker_recognition_GMM_UBM

A speaker recognition system which uses GMM-UBM for use in an Android application which helps in monitoring patients suffering from Schizophrenia.
51 stars 17 forks source link

I totally got MFCC 3380 lines, but it errors, Wouly you help me,thank you very much? #4

Open 18376672766666 opened 3 years ago

18376672766666 commented 3 years ago

PS D:\jupyter_file\VITbook\third\speaker_recognition_GMM_UBM-master\src\speaker_recognition> python speaker_recognition.py --csv_file combined_MFCC.csv --operation ubm (32, 13) (32, 13, 13) (32, 1) Traceback (most recent call last): File "speaker_recognition.py", line 47, in train_ubm(arguments) File "D:\jupyter_file\VITbook\third\speaker_recognition_GMM_UBM-master\src\speaker_recognition\UBM.py", line 106, in train_ubm e_step() File "D:\jupyter_file\VITbook\third\speaker_recognition_GMM_UBM-master\src\speaker_recognition\UBM.py", line 88, in e_step num[k] = pi_k[k] * (unit_gaussian(data[n-1],mu_k[k],cov_k[k])) IndexError: index 3380 is out of bounds for axis 0 with size 3380

RiskySignal commented 3 years ago

hello, @18376672766666 It looks like that the index pointer n of variable data is out of bounds because the default assignment to N is 10000. You can see the defination at line 18 in speaker_recognition.py. There are two ways to solve it. The first method, sets the N to 3380 in your case like this:

    parser.add_argument("--N",type = int, help = "number of datapoints from csv file",default = 3380)

To be noted, this way is not universal. If you change the training dataset, you would probably need to change that number. And I prefer to modify line 30 in UBM.py like this:

    data = []
    N = args.N
    D = args.D
    K = args.K
    iterations = 0
    with open(args.csv_file,"r") as f:
        reader = csv.reader(f,delimiter = ",")
        for count,datum in enumerate(reader):
            if count < N:
                data.append(datum)
            else:
                break
    N = len(data)  # add this line.

Hope useful.

18376672766666 commented 3 years ago

hello, @18376672766666 It looks like that the index pointer n of variable data is out of bounds because the default assignment to N is 10000. You can see the defination at line 18 in speaker_recognition.py. There are two ways to solve it. The first method, sets the N to 3380 in your case like this:

    parser.add_argument("--N",type = int, help = "number of datapoints from csv file",default = 3380)

To be noted, this way is not universal. If you change the training dataset, you would probably need to change that number. And I prefer to modify line 30 in UBM.py like this:

    data = []
    N = args.N
    D = args.D
    K = args.K
    iterations = 0
    with open(args.csv_file,"r") as f:
        reader = csv.reader(f,delimiter = ",")
        for count,datum in enumerate(reader):
            if count < N:
                data.append(datum)
            else:
                break
    N = len(data)  # add this line.

Hope useful.

Thank you very much, I succeed to testing the map adapted model. But I found another question, I don't know what is the evaluating standard. For example, I got the output as following.

PS D:\jupyter_file\VITbook\third\speaker_recognition_GMM_UBM-master\src\speaker_recognition> python testing_model.py --map_file_name map_file.npy --ubm_file_name ubm_file.npy --test_csv_file a_2_16.csv --N 1500 [-57706.67108446]

how do I know whether the .csv file is similiar to the map_file.npy by [-57706.67108446] ?

我不知道输出结果[-57706.67108446]代表什么含义

RiskySignal commented 3 years ago

[-57706.67108446] is the log-likelihood of the test audio clip. You can use LLR or HNORM to judge whether the clip is belong to the target user. You can find the definition in this paper. But I did not find how to set these thresholds. So, I think maybe it's a way to use a valiadation dataset to choose a best threshold making the FNR-FPR best.