GigaAM: the family of open-source acoustic models for speech processing

plot

GigaAM
GigaAM for Speech Recognition
- GigaAM-CTC
- GigaAM-RNNT
GigaAM-Emo
License
Links

GigaAM

GigaAM (Giga Acoustic Model) is a Conformer-based wav2vec2 foundational model (around 240M parameters). We trained GigaAM on nearly 50 thousand hours of diversified speech audio in the Russian language.

Resources:

GigaAM for Speech Recognition

We fine-tuned the GigaAM encoder for Speech Recognition with two different decoders:

GigaAM-CTC was fine-tunined with Connectionist Temporal Classification and character-based tokenizer.
GigaAM-RNNT was fine-tuned with RNN Transducer loss and subword tokenizer.

Both models were trained using the NeMo toolkit on publicly available Russian labeled data:

dataset	size, hours	weight
Golos	1227	0.6
SOVA	369	0.2
Russian Common Voice	207	0.1
Russian LibriSpeech	93	0.1

Resources:

GigaAM-CTC:
GigaAM-RNNT:

The following table summarizes the performance of different models in terms of Word Error Rate on open Russian datasets:

model	parameters	Golos Crowd	Golos Farfield	OpenSTT Youtube	OpenSTT Phone calls	OpenSTT Audiobooks	Mozilla Common Voice	Russian LibriSpeech
Whisper-large-v3	1.5B	17.4	14.5	21.1	31.2	17.0	5.3	9.0
NVIDIA Ru-FastConformer-RNNT	115M	2.6	6.6	23.8	32.9	16.4	2.7	11.6
GigaAM-CTC	242M	3.1	5.7	18.4	25.6	15.1	1.7	8.1
GigaAM-RNNT	243M	2.3	4.4	16.7	22.9	13.9	0.9	7.4

GigaAM-Emo

GigaAM-Emo is an acoustic model for Emotion Recognition. We fine-tuned the GigaAM Encoder on the Dusha dataset.

Resources:

The following table summarizes the performance of different models on the Dusha dataset:

		Crowd			Podcast
	Unweighted Accuracy	Weighted Accuracy	Macro F1-score	Unweighted Accuracy	Weighted Accuracy	Macro F1-score
DUSHA baseline (MobileNetV2 + Self-Attention)	0.83	0.76	0.77	0.89	0.53	0.54
АБК (TIM-Net)	0.84	0.77	0.78	0.90	0.50	0.55
GigaAM-Emo	0.90	0.87	0.84	0.90	0.76	0.67

salute-developers / GigaAM

readme

GigaAM: the family of open-source acoustic models for speech processing

Table of contents

GigaAM

GigaAM for Speech Recognition

GigaAM-CTC:

GigaAM-RNNT:

GigaAM-Emo

Links