jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
MIT License
2.87k stars 326 forks source link

Traceback errors and proxy issues #168

Closed dylanschweitzer closed 1 year ago

dylanschweitzer commented 1 year ago

To Reproduce

Steps to reproduce the behavior:

Which Python version are you using?

3.8

Which version of youtube-transcript-api are you using?

0.5

What code / cli command are you executing?

from youtube_transcript_api import YouTubeTranscriptApi

YouTubeTranscriptApi.get_transcript('puTiLuCw8lI')

Expected behavior

Retrieve the transcript

Actual behaviour

I get a bunch of errors:

Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 700, in urlopen self._prepare_proxy(conn) File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 994, in _prepare_proxy conn.connect() File "/usr/local/lib/python3.8/dist-packages/urllib3/connection.py", line 369, in connect self._tunnel() File "/usr/lib/python3.8/http/client.py", line 905, in _tunnel raise OSError("Tunnel connection failed: %d %s" % (code, OSError: Tunnel connection failed: 403 Forbidden During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 489, in send resp = conn.urlopen( File "/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py", line 785, in urlopen retries = retries.increment( File "/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.youtube.com', port=443): Max retries exceeded with url: /watch?v=puTiLuCw8lI (Caused by ProxyError('Can not connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden'))) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "youtube.py", line 3, in YouTubeTranscriptApi.get_transcript('puTiLuCw8lI') File "/home/dylanschweitzer/.local/lib/python3.8/site-packages/youtube_transcript_api/_api.py", line 132, in get_transcript return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch() File "/home/dylanschweitzer/.local/lib/python3.8/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts return TranscriptListFetcher(http_client).fetch(video_id) File "/home/dylanschweitzer/.local/lib/python3.8/site-packages/youtube_transcript_api/_transcripts.py", line 47, in fetch self._extract_captions_json(self._fetch_video_html(video_id), video_id) File "/home/dylanschweitzer/.local/lib/python3.8/site-packages/youtube_transcript_api/_transcripts.py", line 79, in _fetch_video_html html = self._fetch_html(video_id) File "/home/dylanschweitzer/.local/lib/python3.8/site-packages/youtube_transcript_api/_transcripts.py", line 88, in _fetch_html response = self._http_client.get(WATCH_URL.format(video_id=video_id)) File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 600, in get return self.request("GET", url, kwargs) File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 587, in request resp = self.send(prep, send_kwargs) File "/usr/local/lib/python3.8/dist-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) File "/usr/local/lib/python3.8/dist-packages/requests/adapters.py", line 559, in send raise ProxyError(e, request=request) requests.exceptions.ProxyError: HTTPSConnectionPool(host='www.youtube.com', port=443): Max retries exceeded with url: /watch?v=puTiLuCw8lI (Caused by ProxyError('Canno t connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))

jdepoix commented 1 year ago

Hi @dylanschweitzer, I can not reproduce this for the video uCbpDW0p0Gs. Does this happen only for this video, or all videos you are trying to get a transcript for? Did this happen after bombarding YouTube with a lot of requests? (you might get temporarly blocked when doing to many requests...)

dylanschweitzer commented 1 year ago

You're right, I tried it on another IP and it worked. Looks like it wasn't working from the host I used.