Closed petiatil closed 9 months ago
I initially thought the issue was resolved (as likely due to a lack of time per speaker, which may be the case for the 12+ minute file), but when testing the same code with a 30 minute English/Spanish file, using the same inputs (with expectedLanguages
as "en"
and "es"
, it lists each word as "es"
, whether English or Spanish.
My next test will be to see if this will be resolved by enabling diarization (It wasn't used with that test)
Update: Testing with diarization didn't resolve the issue (all words' language was labeled "es"
)
Hi @petiatil - thanks for the detailed bug description. This is the correct/expected behaviour as we only support single languages per file at the moment. So the auto
language functionality will take samples to determine what it thinks the predominant language is, in this case, either en
or es
, and the results will be labelled as that.
We will start to support a bilingual Spanish/English pack soon, however, that will not label results with the specific language either.
@nickgerig I see, thank you.
If applicable:
I'm finalizing Speechmatics integration in an app with Real-time and Batch options. Could you clarify if the auto
and expected_languages
options enhance transcription quality in mixed-language contexts (such as better Spanish spelling when English is predominant [compared to just manually selecting English]), or are they mainly for determining the dominant language model to use?
The app doesn't hinge crucially on this; I'm just aiming for simplicity in offered options (to currently include what could significantly impact transcription quality).
Finally, unless directed otherwise, I'll assume (based on this article) the difference of quality between Real-time and Batch is still currently negligible or equal. I have max_delay
set internally to 20.
I'm finalizing Speechmatics integration in an app with Real-time and Batch options. Could you clarify if the
auto
andexpected_languages
options enhance transcription quality in mixed-language contexts (such as better Spanish spelling when English is predominant [compared to just manually selecting English]), or are they mainly for determining the dominant language model to use?
yes - it's the latter, just about determining the correct language
Finally, unless directed otherwise, I'll assume (based on this article) the difference of quality between Real-time and Batch is still currently negligible or equal. I have
max_delay
set internally to 20.
Correct - there should be no difference between Real-time and Batch in this case.
Current behaviour
Only English is detected when transcribing audio (testing Batch transcription) for a 12+ minute English/Spanish video (
"en"
is the value for "language" in all words in the transcription results)Steps to Reproduce
Download audio from the YouTube link or GoogleDrive link
Update the
audio_file
(path to audio file) andspeechmaticsAPIkey
variablesExpected Behaviour
The 'language' data points in
transcript['results']
corresponding to Spanish words are expected to be be"es"
.Environment
Mac, Ventura 13.6.2, Python 3.10, standard Python venv.
Other Info
Diarization works for this file, but language detection is still English-only.