jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
MIT License
2.87k stars 326 forks source link

JSONDecodeError for very specific video id #144

Closed xenova closed 2 years ago

xenova commented 2 years ago

For some reason, I am unable to extract transcripts for https://www.youtube.com/watch?v=BaxBFnIUTrc

from youtube_transcript_api import YouTubeTranscriptApi

YouTubeTranscriptApi.get_transcript('BaxBFnIUTrc')

This outputs:

JSONDecodeError                           Traceback (most recent call last)
[<ipython-input-2-798f899ea179>](https://localhost:8080/#) in <module>()
      1 from youtube_transcript_api import YouTubeTranscriptApi
      2 
----> 3 YouTubeTranscriptApi.get_transcript('BaxBFnIUTrc')

6 frames
[/usr/lib/python3.7/json/decoder.py](https://localhost:8080/#) in raw_decode(self, s, idx)
    351         """
    352         try:
--> 353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
    355             raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting ',' delimiter: line 1 column 1576 (char 1575)

I am running the latest version (youtube-transcript-api-0.4.3) and it fails on both windows and linux.

xenova commented 2 years ago

Another (somewhat similar) error occurs for the X2agMBdnv3c:

from youtube_transcript_api import YouTubeTranscriptApi

YouTubeTranscriptApi.get_transcript('X2agMBdnv3c')

which has the following error:

  File "<ipython-input-10-011ad7250ddf>", line 4, in <module>
    YouTubeTranscriptApi.get_transcript('X2agMBdnv3c')
  File "/usr/local/lib/python3.7/dist-packages/youtube_transcript_api/_api.py", line 128, in get_transcript
    return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch()
  File "/usr/local/lib/python3.7/dist-packages/youtube_transcript_api/_transcripts.py", line 292, in fetch
    _raise_http_errors(response, self.video_id).text,
  File "/usr/local/lib/python3.7/dist-packages/youtube_transcript_api/_transcripts.py", line 334, in parse
    for xml_element in ElementTree.fromstring(plain_data)
  File "/usr/lib/python3.7/xml/etree/ElementTree.py", line 1316, in XML
    return parser.close()
  File "<string>", line None
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
jdepoix commented 2 years ago

Hi @xenova, thank you for reporting this and sorry for the late reply I've been really busy lately 🙈 I just opened a PR to fix this issue (#149) and will release it as v0.4.4!

xenova commented 2 years ago

Thanks! 😄 No worries.

jdepoix commented 2 years ago

The fix has been released as of v0.4.4! 👍