This repository contains a Python script that allows users to download the audio from a YouTube video, transcribe it into text, detect the language and save the transcription in txt file automatically.
The youtube video is in 98% Hindi along with some english words.
I am getting this error while trying to do transcription.
Detected language: ur
Traceback (most recent call last):
File "C:\Users\geeky\OneDrive\Desktop\YT Transcribe\youtube_audio_to_text.py", line 55, in
create_and_open_txt(transcribedtext, f"output{language}.txt")
File "C:\Users\geeky\OneDrive\Desktop\YT Transcribe\youtube_audio_to_text.py", line 24, in create_and_open_txt
file.write(text)
File "C:\Users\geeky\anaconda3\envs\yt\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-5: character maps to
Is their any way to specify language manually? Beacuse hindi and urdu are very similar when spoken but they have different writing script Devanagari & Arabic respectively.
The youtube video is in 98% Hindi along with some english words. I am getting this error while trying to do transcription.
Detected language: ur Traceback (most recent call last): File "C:\Users\geeky\OneDrive\Desktop\YT Transcribe\youtube_audio_to_text.py", line 55, in
create_and_open_txt(transcribedtext, f"output{language}.txt")
File "C:\Users\geeky\OneDrive\Desktop\YT Transcribe\youtube_audio_to_text.py", line 24, in create_and_open_txt
file.write(text)
File "C:\Users\geeky\anaconda3\envs\yt\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-5: character maps to