jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
MIT License
2.55k stars 280 forks source link

'charmap' codec can't encode character '\u0101' #235

Closed BillionShields closed 6 months ago

BillionShields commented 7 months ago

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Steps to reproduce the behavior: run CLI with list of YT clips

What code / cli command are you executing?

For example: I am running

youtube_transcript_api "BwyZIWeBpRw" --format text  >>"temp.txt"
youtube_transcript_api "at37Y8rKDlA" --format text  >>"temp.txt"

all those 17O5mgXZ9ZU 4b6bwcWK6GE 6ZrlsVx85ek at37Y8rKDlA BwyZIWeBpRw ErrorCOde FFwA0QFmpQ4 gMRph_BvHB4 H-XfCl-HpRM hcuMLQVAgEg hx3U64IXFOY J7SrAEacyf8 JPX8g8ibKFc LG53Vxum0as LVxL_p_kToc mcPSRWUYCv0 NAATB55oxeQ qJXKhu5UZwk ufsIA5NARIo uuP-1ioh4LY vA50EK70whE x7qbJeRxWGw XfURDjegrAw

Which Python version are you using?

Python 3.12

Which version of youtube-transcript-api are you using?

youtube-transcript-api 2023.7.22

Expected behavior

Describe what you expected to happen.

For example: I expected to receive the english transcript

Actual behaviour

Describe what is happening instead of the Expected behavior. Add error messages if there are any.

empty file and error


ine 88, in _run_code
  File "\Python312\Scripts\youtube_transcript_api.exe\__main__.py", line 7, in <module>
  File "\Python312\Lib\site-packages\youtube_transcript_api\__main__.py", line 11, in main
    print(YouTubeTranscriptCli(sys.argv[1:]).run())
  File "\Python312\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' in position 1891: character maps to <undefined>
        1 file(s) moved. 
jdepoix commented 7 months ago

Hi @BillionShields, I tried a few of the IDs in your list. It seems that none of them have a transcript available in en (some of them in en-US however). I would assume that your error is linked to the error message returned by youtube-trancript-api in those cases.

jdepoix commented 6 months ago

Closed due to inactivity