jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
MIT License
2.55k stars 280 forks source link

Subtitles disabled, with one code I can recover it and with another no #203

Closed LuisJavierFI closed 1 year ago

LuisJavierFI commented 1 year ago

To Reproduce

I wish to retrieve the transcript of a list of over 3k videos but I get the error stated in the section provided. In short, it says this "Subtitles are disabled for this video"

The subtitles are not available and indeed when I go to YouTube they are not there, but if I run the following code it retrieves the subtitles.

With this code I retrieve the subtitles even though they are not available on YouTube.

transcript_list = YouTubeTranscriptApi.get_transcript("UOXsLrjjtpQ",languages=["en"])
valores = [diccionario["text"] +""+ str(round(diccionario["start"]/60,2)) for diccionario in transcript_list]

However, if I pass a list of id_videos to the previous code it generates the error even with the same id_video.

id_video = ['xxxxxxx','xxxxxxx',.......,'UOXsLrjjtpQ',.........'xxxxxxx']
for id in id_video:
    transcript_list = YouTubeTranscriptApi.get_transcript(id,languages=['en'])
    valores = [diccionario['text'] +' '+ str(round(diccionario['start']/60,2)) for diccionario in transcript_list]

What code / cli command are you executing?

I'm running it from Google Colab. I don't know if it affects anything or changes.

  transcript_list = YouTubeTranscriptApi.get_transcript('UOXsLrjjtpQ',languages=['en'])

Which Python version are you using?

Python 3.9.16

Which version of youtube-transcript-api are you using?

youtube-transcript-api 0.5.0

Expected behavior

I wish to obtain the transcript of a list of more than two thousand videos but it gives me an error. This is the id that gives me an error -----> "UOXsLrjjtpQ"

Actual behaviour

I don't manage to recover all the subtitles of my id_videos list that I have, around id_video 80 I get an error.

TranscriptsDisabled: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=UOXsLrjjtpQ! This is most likely caused by:

Subtitles are disabled for this video

Thank's

LuisJavierFI commented 1 year ago

I have another doubt to handle exceptions with the library.

I am retrieving videos from YouTube video lists, I am only interested in those in which the language is English, I made my attempt at an exception with the following code.

# id_video = 'slPTDLrNo48'  ---> koreano
try:
  transcript_list = YouTubeTranscriptApi.get_transcript('slPTDLrNo48',languages=['en'])
  valores = [diccionario['text'] +' '+ str(round(diccionario['start']/60,2)) for diccionario in transcript_list]
  print(valores)
except Exception as e:
    pass

There are lists created by people who have videos in various languages and this code worked for me.

I have also come across videos that have melody without any lyrics and there the program sometimes fails and others not, I have not been able to solve this

# id_video melody ----> '2mIZAkGg5Js'
try:
    transcript_list = YouTubeTranscriptApi.get_transcript('2mIZAkGg5Js',languages=['en'])
    valores = [diccionario['text'] +' '+ str(round(diccionario['start']/60,2)) for diccionario in transcript_list]
except Exception as e:
   pass

Some suggestion? Thanks!!!

jdepoix commented 1 year ago

Hi @LuisJavierFI, for me running the following line will throw a TranscriptDisabled exception:

transcript_list = YouTubeTranscriptApi.get_transcript('UOXsLrjjtpQ',languages=['en'])

Could you please restart the kernel of your colab and only run this line and confirm that it does not throw this error? And if so, what does it return?

Regarding your second question: could you please clarify what exactly you are asking for, I am sure whether I understand the problem here. The video 2mIZAkGg5Js has disabled subtitles, so it seems correct if a TranscriptDisabled error is thrown. The other part of the question is not clear to me. If you want to skip videos for which there is no English transcript, you can catch NoTranscriptFound and ignore them. Does that help?

jdepoix commented 1 year ago

Hi @crhowell, did this solve your issue? can I close this issue?

crhowell commented 1 year ago

@jdepoix Sorry for the late response, you must've mixed up the @ mention. I am not the author of this issue.

jdepoix commented 1 year ago

@crhowell Oopsie, sorry, you are right! 🙈

jdepoix commented 1 year ago

Hi @LuisJavierFI, did this solve your issue? can I close this issue?

jdepoix commented 1 year ago

Closed due to inactivity.