Open pankaj2701 opened 6 years ago
Sorry for absent of detail description of specification of training dataset,
But, it is very simple. You can find the dataformat by investigating the data in /data/raw/train or /data/raw/valid
The speech data should be .wav file whose sampling rate at 16khz and the label must be .mat file whose have 1 dimension and the values are just 1 (if speech) or 0 (if non-speech). For the direct understanding, plz open the sample training data in /data/raw/train
Thx!
Thanks for the quick reply. I still have one doubt. While marking the labels do we have to count the overlapping frames or non overlapping
You don't have to conduct framing on the label. The needed label is just sample based label.
For example if speech signal has 10,000 samples. The label also should have 10,000 samples.
Please download our sample wav & label and verify these.
Thx!
The reason I am asking the question is because I want to train it on my data. So I need to know how to prepare the training data.
I saw the sample files given but it is not very clear how the samples have been labeled. Some samples are are having a value of zero and some are having a value of 1. I guess the value of 1 means that corresponding sample is a speech sample. But I have not been able to visually correlate the sample numbers with the waveforms.
Your guess is correct, the 1 corresponds to speech and 0 corresponds to the non-speech the plot is like as below:
Note that if the speech data has noise, it is hard to discriminate speech or non-speech visually in 1d signal domain.
hello,i have the same problem with you. now (1)do you konw the method of formating the train data? (2)i dont konw that how do the one label of a mat file correspond with the wav file?
I have not been able to understand the way training data should be specified. Like how the labels should be written. Do we need to specify the time at which labels occur in the sound file. If yes how and where