jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
MIT License
2.55k stars 280 forks source link

YouTube Transcript API no longer works #184

Closed chand1012 closed 1 year ago

chand1012 commented 1 year ago

To Reproduce

Steps to reproduce the behavior:

  1. Attempt to download any video transcript. image

    Which Python version are you using?

    3.10.10 on M1 Mac

    Which version of youtube-transcript-api are you using?

    0.5.0

    What code / cli command are you executing?

    youtube_transcript_api https://www.youtube.com/watch\?v\=-f906Sy79hA and

    
    from youtube_transcript_api import YouTubeTranscriptApi

data = { 'url': 'https://www.youtube.com/watch\?v\=-f906Sy79hA' } resp = YouTubeTranscriptApi.get_transcript(data['url'].split("=")[-1])

# Expected behavior
List of dictionaries response of the API.
# Actual behaviour

Could not retrieve a transcript for the video https://www.youtube.com/watch?v=https://www.youtube.com/watch?v=-f906Sy79hA! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!



# Additional information
Youtube-DL seems to have [also broken](https://github.com/ytdl-org/youtube-dl/issues/31535), and the issue seems to [vary by region](https://github.com/ytdl-org/youtube-dl/issues/31530#issuecomment-1433869627). I will update if the issues goes away.
chand1012 commented 1 year ago

This line seems to be the problem.

https://github.com/jdepoix/youtube-transcript-api/blob/6070e6165ae5a53e92085ad5f967b20ea2cde59d/youtube_transcript_api/_transcripts.py#L51

I stored the raw HTML as a file for this video and then tried to manually search for the string "captions": in the html and found nothing. I hope that there is a way around this, but for now I hope this information helps someone fix the issue.

chand1012 commented 1 year ago

Okay, this seems to have been user error. I was accidentally inputting the entire video URL rather than just the ID to the video. Sorry for the inconvenience!

For example, this will not work.

python -m youtube_transcript_api https://www.youtube.com/watch\?v\=fE2sunDZhzg

But this will:

python -m youtube_transcript_api fE2sunDZhzg