usc-sail / child-adult-diarization

public child-adult speaker diarization/classification model and codes
4 stars 1 forks source link

Model throws ValueError because of audio files #1

Open kRichard32 opened 13 hours ago

kRichard32 commented 13 hours ago

image

Because whisper expects the mel input features to be of length 3000, the whisper model throws exceptions. This was a quick workaround I implemented, but there's probably a better way of doing things...

AnfengXu136 commented 13 hours ago

Thank you for raising the issue. Did you try transformers==4.30.2 (as in requirements.txt)? For the quick start with 10s input audio, we noticed the issue when using a more recent transformer version, but it should work on the older transformer version.

However, for anyone wishing to train the model with variable input length larger than 30s, this walkaround with padding to 30s can work. But I believe the positional embedding replacement codes must be commented out. I will keep this issue open for people to reference. Thank you for pointing this out.

kRichard32 commented 12 hours ago

Ah, I was using python3.12 so I had to use a more recent transformers version. Thanks for the help.