jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
MIT License
2.79k stars 312 forks source link

xml.etree.ElementTree.ParseError: no element found: line 1, column 0 #320

Open KarenPHS opened 3 weeks ago

KarenPHS commented 3 weeks ago

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Steps to reproduce the behavior:

What code / cli command are you executing?

For example: I am running

source = YouTubeTranscriptApi.list_transcripts("SeXZt5hqe6I")
en_caption = source.find_transcript(['en']).fetch()

Which Python version are you using?

Python 3.6.4

Which version of youtube-transcript-api are you using?

youtube-transcript-api 0.6.2

Expected behavior

Describe what you expected to happen.

For example: I expected to receive the english transcript

Actual behaviour

Describe what is happening instead of the Expected behavior. Add error messages if there are any.

For example: Instead I received the following error message:

  File "E:\Python Project\yt-concate-test\yt-concate\venv\lib\site-packages\youtube_transcript_api\_transcripts.py", line 293, in fetch
    _raise_http_errors(response, self.video_id).text,
  File "E:\Python Project\yt-concate-test\yt-concate\venv\lib\site-packages\youtube_transcript_api\_transcripts.py", line 358, in parse
    for xml_element in ElementTree.fromstring(plain_data)
  File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\xml\etree\ElementTree.py", line 1315, in XML
    return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
jdepoix commented 3 weeks ago

Hi @KarenPHS, I cannot replicate that error. Does that happen for every video or only SeXZt5hqe6I?

KarenPHS commented 2 weeks ago

No, I tried. But it happened at least once when I downloaded captions from videos.

import urllib.request
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api._errors import NoTranscriptFound, TranscriptsDisabled
from xml.etree.ElementTree import ParseError
import json

base_video_url = 'https://www.youtube.com/watch?v='
base_search_url = 'https://www.googleapis.com/youtube/v3/search?'

API_KEY=''
channel_id = 'UCKSVUHI9rbbkXhvAXK-2uxA'
first_url = base_search_url + 'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(API_KEY, channel_id)

video_links = []
url = first_url

# download all video links from a channel
while True:
    inp = urllib.request.urlopen(url)
    resp = json.load(inp)

    for i in resp['items']:
        if i['id']['kind'] == "youtube#video":
            video_links.append(base_video_url + i['id']['videoId'])

    try:
        next_page_token = resp['nextPageToken']
        url = first_url + '&pageToken={}'.format(next_page_token)
    except KeyError:
        break

# download all captions from all videos
for url in video_links:
    url_id = url.split('watch?v=')[-1]
    while True:
        try:
            source = YouTubeTranscriptApi.list_transcripts(url_id)
            en_caption = source.find_transcript(['en']).fetch()  
            break
        except (KeyError, NoTranscriptFound, TranscriptsDisabled):
            print('No captions there', url_id)
            break
        except ParseError:
            print('ParseError. there is a caption in', url, ', so, try again')
jdepoix commented 2 weeks ago

So it doesn't happen consistently for SeXZt5hqe6I, but just randomly happened once?

Araule commented 1 week ago

Hello, I have the same problem. For around 200 videos, I catch this error around 3-5 times every time, never the same ids.

Traceback (most recent call last):
  File "/home/araule/Documents/corpus/scripts/get_videos.py", line 375, in get_transcripts
    res = YouTubeTranscriptApi.get_transcript(video_id, languages=['fr'])
  File "/home/araule/miniconda3/envs/youtube/lib/python3.10/site-packages/youtube_transcript_api/_api.py", line 137, in get_transcript
    return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch(preserve_formatting=preserve_formatting)
  File "/home/araule/miniconda3/envs/youtube/lib/python3.10/site-packages/youtube_transcript_api/_transcripts.py", line 292, in fetch
    return _TranscriptParser(preserve_formatting=preserve_formatting).parse(
  File "/home/araule/miniconda3/envs/youtube/lib/python3.10/site-packages/youtube_transcript_api/_transcripts.py", line 358, in parse
    for xml_element in ElementTree.fromstring(plain_data)
  File "/home/araule/miniconda3/envs/youtube/lib/python3.10/xml/etree/ElementTree.py", line 1348, in XML
    return parser.close()
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

I use Python 3.10.14 and youtube-transcript-api 0.6.2 (downloaded with pip).

KarenPHS commented 1 week ago

So it doesn't happen consistently for SeXZt5hqe6I, but just randomly happened once?

Yes, it randomly happened, but more than once.

dgarridoa commented 6 days ago

I got the same issue using Python 3.11.3 using youtube-transcript-api 0.6.2. And also note that happens randomly, when I retried it ended up working.

ERROR:root:no element found: line 1, column 0, 3q67v12M31M ERROR:root:no element found: line 1, column 0, McRUxBHgFIo ERROR:root:no element found: line 1, column 0, mokGJiXVw_4