Closed snakers4 closed 4 years ago
@akreal As you requested. Some results of a more or less well trained model (csv+feather formats). share_results_1.zip
You can open the feather
file like this
import pandas as pd
df = pd.read_feather('../data/share_results_1.feather')
Overall, this model is not overfitted and there is no post-processing yet.
Perfect, thank you!
As you can see the model is not fully fitted yet (we are still in exploratory phase) But it works perfectly on some easier datasets already
Obviously I exclude the following datasets from the file
Now if we exclude "bad" files from here, we will get more interesting results. I cannot say that all of these files have poor annotation, but the majority do.
Almost finished collecting v05 and searching hyper-params, will be posting new benchmarks and new data soon
@snakers4 What model did you use for benchmark?
@m1ckyro5a wav2letter inspired fork of the fork of deep speech pytorch
@snakers4 How about deepspeech2? Which model is better?
It is hard to tell yet The performance now is more limited by the data for us, more than by the model Of course we compared some models side by side (CNN, RNN) only to find that RNNs were a bit better with the same number of weight updates, but slower in general
Some benches we ran on LibriSpeech network_bench.xlsx
I will structure the benchmark files from now a bit
Please note that exclusion files #7 were based on this benchmarks as well previously
All charts contain CER
CNN trained with CTC loss Tuning with phonemes
TED talks are much cleaner
Notice the second normal bump
Pranks are very noisy by default
Quite good fit as well
An idea on how to set thresholds:
CLEAN_THRESHOLDS = {
# very strict conditions, datasets are clean, no problem
'tts_russian_addresses_rhvoice_4voices':0.2,
'private_buriy_audiobooks_2':0.1,
# strict conditions, datasets vary
'public_youtube700':0.2,
'public_youtube1120':0.2,
'public_youtube1120_hq':0.2,
'public_lecture_1':0.2,
'public_series_1':0.2,
# strict conditions, dataset mostly clean
'radio_2':0.2,
# very strict conditions, datasets are dirty
'asr_public_phone_calls_1':0.2,
'asr_public_phone_calls_2':0.2,
'asr_public_stories_1':0.2,
'asr_public_stories_2':0.2,
# mostly just to filter outliers
'ru_tts':0.4,
'ru_ru':0.4,
'voxforge_ru':0.4,
'russian_single':0.4
}
Also a comment - model was not over-fitted, it is selected based on optimal generalization
https://ru-open-stt.ams3.digitaloceanspaces.com/benchmark_v05_public.csv.zip is in fact a gzip-compressed file (not a zip-compressed one), so one should decompress it with zcat benchmark_v05_public.csv.zip > benchmark_v05_public.csv
unzipping fails with:
$ unzip benchmark_v05_public.csv.zip
Archive: benchmark_v05_public.csv.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of benchmark_v05_public.csv.zip or
benchmark_v05_public.csv.zip.zip, and cannot find benchmark_v05_public.csv.zip.ZIP, period.
after gzip-decompression the first line contains some weird stuff:
$ head -n 1 benchmark_v05_public.csv
data/dataset_cleaning/benchmark_v05_public.csv0000644000175000001441656463430613513563560021050 0ustar kerasusers
Hi! What datasets have speaker labels? Is there any information in which release the speaker labels will be? Thanks a lot!
We decided not to update and / or maintain these for reasons.
Below I will post some of the results on the public part of the dataset Both train and validation
Hope this will inspire the community to share their results and models