Closed thoughtfuldata closed 2 years ago
Actually after more research, it seems to happen once in a while with this
youtube_transcript_api.YouTubeTranscriptApi.get_transcript('BaxBFnIUTrc')
c:\Users\manue\Documents\Github\data-science-venv\.venv\lib\site-packages\youtube_transcript_api\_api.py in get_transcript(cls, video_id, languages, proxies, cookies)
126 :rtype [{'text': str, 'start': float, 'end': float}]:
127 """
--> 128 return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch()
129
130 @classmethod
c:\Users\manue\Documents\Github\data-science-venv\.venv\lib\site-packages\youtube_transcript_api\_api.py in list_transcripts(cls, video_id, proxies, cookies)
68 http_client.cookies = cls._load_cookies(cookies, video_id)
69 http_client.proxies = proxies if proxies else {}
---> 70 return TranscriptListFetcher(http_client).fetch(video_id)
71
72 @classmethod
c:\Users\manue\Documents\Github\data-science-venv\.venv\lib\site-packages\youtube_transcript_api\_transcripts.py in fetch(self, video_id)
34 self._http_client,
35 video_id,
---> 36 self._extract_captions_json(self._fetch_video_html(video_id), video_id)
37 )
38
c:\Users\manue\Documents\Github\data-science-venv\.venv\lib\site-packages\youtube_transcript_api\_transcripts.py in _extract_captions_json(self, html, video_id)
48 raise TranscriptsDisabled(video_id)
49
---> 50 captions_json = json.loads(
51 splitted_html[1].split(',"videoDetails')[0].replace('\n', '')
52 )['playerCaptionsTracklistRenderer']
~\.pyenv\pyenv-win\versions\3.9.8\lib\json\__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
348 cls = JSONDecoder
~\.pyenv\pyenv-win\versions\3.9.8\lib\json\decoder.py in decode(self, s, _w)
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):
~\.pyenv\pyenv-win\versions\3.9.8\lib\json\decoder.py in raw_decode(self, s, idx)
351 """
352 try:
--> 353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
355 raise JSONDecodeError("Expecting value", s, err.value) from None
JSONDecodeError: Expecting ',' delimiter: line 1 column 1576 (char 1575)
Hi @thoughtfuldata, it's hard to analyse much without seeing your code, but given the fact that you're doing multiple requests in parallel this is not surprising at all. One of the reoccurring problems when using this module is that YouTube tends to block requests when you are executing too many at a time and there's not really anything we can do about that. So when you're parallelising requests, this problem will only become more apparent.
However, I agree that I should add a raise_for_status()
in _fetch_video_html()
and return an exception wrapping the status code. Unfortunately, this won't really fix your problem though.
Error message for error status codes is being added in #132
Thanks!
This helps me out
I forgot to mention: the improved error message has been released with version 0.4.2
This may be out of scope as I am using youtube-transcript-api with parallel processing and the issue only happens with it. However I believe it is the way the youtube-transcript-api that is handling that error that is the bug.
System information
I originally believe it to be an issue with the parallel processing package, however after speaking with that maintainer of that package. The guess is that it could be:
by you, he's referring to me
Heres the remote traceback
Let me know if anything else is needed.