Closed anioji closed 6 months ago
Hey @anioji, thanks for opening the issue,
Could you send the dataset you're working on ? I've searched for Anioji/testra
but it seems to be private.
Also you might want to double-check:
The decision is becoming stranger with every MONTH, and at the same time not at all obvious and not even understandable to me personally.
I don't understand what you mean there!
Hey @anioji, thanks for opening the issue, Could you send the dataset you're working on ? I've searched for
Anioji/testra
but it seems to be private.
I can provide a reading token if it's useful.
Stereo, maybe. The wav file has two channels and is pseudo-stereo. I can convert everything to mono. But I think that's not the point
But most likely the problem is really in the length of the audio recordings. These are short phrases of 1-10 seconds
The decision is becoming stranger with every MONTH, and at the same time not at all obvious and not even understandable to me personally.
Im stupid, and expressive person. And for a month I have been receiving floating errors that are solved in ways unknown to me.
Sorry and thanks
The decision is becoming stranger with every MONTH, and at the same time not at all obvious and not even understandable to me personally.
Now, I have reached the annotation of the data set and the test data set for 10 audio recordings has passed all the stages, even on the CPU. But now the annotation does not want to pass even part of what has passed. Despite the fact that the value at which the fall occurs, changes. From 74 audio recordings to 168 and now he is unwilling to go through 4.
If this is a question for the data set. Then I wanted to know the criteria for it.
Text was extracted using Whisper The cast took place using huggingface/datasets