Open Yunlong-He opened 3 years ago
Is your own wav file in English or Chinese? I had a look at the wav file in this repo, it seems like the models were trained in Mandarin. Hence the results were bad on my English wav files.
Thanks to Shuning, I verified on Mandarin wave files too, and I just check the sample result provided in this project, it seems not very good too. I use following code to split wave file:
sound = AudioSegment.from_file(wav_path)
for spk,timeDicts in speakerSlice.items():
print('========= ' + str(spk) + ' =========')
index = 0
for timeDict in timeDicts:
s = timeDict['start']
e = timeDict['stop']
seg = sound[s:e]
seg.export("wavs/rmd/" + str(spk) + "/seg_%d.wav" % index, format="wav")
index = index + 1
I just did a simple try with my phone call wave file, which is about 2.5 minutes, only 2 speakers in total. However, with pretrained model in this project, it returns 3 speakers and many slices contains voices from 2 speakers, I know that uis-rnn doesn't support setting speaker numbers, but the poor performance seems incorrect, has anybody met it?
Thanks if any suggestions.