javedali99 / audio-to-text-transcription

This repository contains a Python script that allows users to download the audio from a YouTube video, transcribe it into text, detect the language and save the transcription in txt file automatically.
https://www.javedali.net/post/2023-04-audio-to-text/
MIT License
117 stars 18 forks source link

Detecting wrong language and Unicode error #3

Open Rudra644 opened 9 months ago

Rudra644 commented 9 months ago

The youtube video is in 98% Hindi along with some english words. I am getting this error while trying to do transcription.

error

Detected language: ur Traceback (most recent call last): File "C:\Users\geeky\OneDrive\Desktop\YT Transcribe\youtube_audio_to_text.py", line 55, in create_and_open_txt(transcribedtext, f"output{language}.txt") File "C:\Users\geeky\OneDrive\Desktop\YT Transcribe\youtube_audio_to_text.py", line 24, in create_and_open_txt file.write(text) File "C:\Users\geeky\anaconda3\envs\yt\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-5: character maps to

Rudra644 commented 9 months ago

Is their any way to specify language manually? Beacuse hindi and urdu are very similar when spoken but they have different writing script Devanagari & Arabic respectively.