jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
MIT License
2.54k stars 279 forks source link

TRANSCRIPTS DISABLED #298

Closed josondev closed 3 days ago

josondev commented 1 week ago

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Steps to reproduce the behavior:

What code / cli command are you executing?

Im running the code in Google colab and getting the error TranscriptsDisabled


### Which Python version are you using?
Python 3.10.12

### Which version of youtube-transcript-api are you using?
youtube-transcript-api 0.6.2

# Expected behavior
Describe what you expected to happen. 

For example: I expected to receive the search agent to analyse the video and answer the questions

# Actual behaviour
Describe what is happening instead of the **Expected behavior**. Add **error messages** if there are any. 

For example: Instead I received the following error message:

issue TranscriptsDisabled
Traceback (most recent call last) in <cell line: 5>() 166 from youtube_transcript_api import YouTubeTranscriptApi 167 video_id=input('enter youtube video link:')[33:] --> 168 transcript=YouTubeTranscriptApi.get_transcript(video_id) 169 search_agent=SearchAgent.add_youtube(transcript) 170 #while(1):

3 frames /usr/local/lib/python3.10/dist-packages/youtube_transcript_api/_transcripts.py in _extract_captions_json(self, html, video_id) 60 raise VideoUnavailable(video_id) 61 ---> 62 raise TranscriptsDisabled(video_id) 63 64 captions_json = json.loads(

TranscriptsDisabled: Could not retrieve a transcript for the video https://www.youtube.com/watch?v=qcphMpZ4sU! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!

code in case:

key=''

ans=input('voice or search or chat or generator or summariser:')

if(ans=='voice'): from lyzr import VoiceBot vb=VoiceBot(api_key=key) ans=input('Text-to-speech or Transcribe or Text-to-notes:') if(ans=='Text-to-speech'): vb.text_to_speech(input('enter the text to be converted to speech:')) print('Successful execution.Please check the files.') elif(ans=='Text-to-notes'): print(vb.text_to_notes(input('enter the text to be converted to notes:'))) else: print(vb.transcribe(input('enter the audio file to be converted to text:')))

elif(ans=='summariser'): from lyzr import Summarizer summarizer = Summarizer(api_key=key) ans=input('enter the text to be converted to a summary:') instructions=input('suummary,notes or tweet:') print(summarizer.summarize(ans))

elif(ans=='generator'): from lyzr import Generator generator=Generator(api_key=key) ans=input('enter the topic to be expanded:') persona=input('target audience:') print(generator.generate(ans))

else:

Prompt user to upload a folder

if(ans=='chat'): from google.colab import files import os from lyzr import ChatBot os.environ['OPENAI_API_KEY'] = key ans=input('word document or pdf or youtube video or website or webpage or text file:') if(ans=='pdf'): while(1): try: uploaded = files.upload() # Returns a dictionary filename = next(iter(uploaded)) '''with open(filename, 'wb') as f: # Save the uploaded file f.write(uploaded[filename])''' # Get the filename chatbot = ChatBot.pdf_chat(input_files=[filename]) # Pass the filename to pdf_chat ans = input("Your question here:") response = chatbot.chat(ans) print(response.response) break except: print('please enter the correct pdf file.')

elif(ans=='word document'):
  while(1):
    try:
      uploaded = files.upload()  # Returns a dictionary
      filename = next(iter(uploaded))  # Get the filename
      chatbot = ChatBot.docx_chat(input_files=[filename])  # Pass the filename to docx_chat
      ans = input("Your question here:")
      response = chatbot.chat(ans)
      print(response.response)
      break
    except:
      print('please enter the correct word document.')

elif(ans=='text file'):
  while(1):
    try:
      uploaded = files.upload()  # Returns a dictionary
      filename = next(iter(uploaded))  # Get the filename
      chatbot = ChatBot.txt_chat(input_files=[filename])  # Pass the filename to text_chat
      ans = input("Your question here:")
      response = chatbot.chat(ans)
      print(response.response)
      break
    except:
      print('please enter the correct text file.')
else:
  if(ans=='youtube video'):
    while(1):
      try:
        ans=input('enter youtube video link:')  # Returns a dictionary
        chatbot = ChatBot.youtube_chat(urls=[ans])  # Pass the filename to pdf_chat
        ans = input("Your question here:")
        response = chatbot.chat(ans)
        print(response.response)
        break
      except:
        print('enter the proper youtube video link:')

  elif(ans=='website'):
    import nest_asyncio       #The error message "RuntimeError: This event loop is already running" usually arises in asynchronous #programming when you try to start a new event loop while another is already active'''
    nest_asyncio.apply()      #prevents conflicting loops
    ans=input('enter website link:') # Returns a dictionary
    while(1):
      try:
        chatbot = ChatBot.website_chat(ans)  # Pass the link
        ans = input("Your question here:")
        response = chatbot.chat(ans)        #fails for chromium based applications idk y  :(
        print(response.response)
        break
      except:
        print("enter the proper website link:")

  elif(ans=='webpage'):
    import nest_asyncio       #The error message "RuntimeError: This event loop is already running" usually arises in asynchronous programming when you try to start a new event loop while another is already active'''
    nest_asyncio.apply()      #prevents conflicting loops
    ans=input('enter webpage link:') # Returns a dictionary
    while(1):
      try:
        chatbot = ChatBot.webpage_chat(ans)  # Pass the link
        ans = input("Your question here:")
        response = chatbot.chat(ans)        #fails for chromium based applications idk y  :(
        print(response.response)
        break
      except:
        print("enter the proper webpage link:")

else:

else:

from google.colab import files
import os
from lyzr import SearchAgent  # Import SearchAgent from lyzr.rag
os.environ['OPENAI_API_KEY'] = key
ans=input('word document or pdf or youtube video or website or webpage or text file:')
if(ans=='pdf'):
  while(1):
    try:
      uploaded = files.upload()  # Returns a dictionary
      filename = next(iter(uploaded))
      search_agent = SearchAgent.add_pdf(input_files=[filename])  # Create SearchAgent from PDF
      ans = input("Your question here:")
      response = search_agent.query(ans)  # Use search method
      print(response)
      break
    except:
      print('please enter the correct pdf file.')

elif(ans=='word document'):
  while(1):
    try:
      uploaded = files.upload()  # Returns a dictionary
      filename = next(iter(uploaded))
      search_agent = SearchAgent.add_docx(input_files=[filename])  # Create SearchAgent from PDF
      ans = input("Your question here:")
      response = search_agent.query(ans)  # Use search method
      print(response)
      break
    except:
      print('please enter the correct word document.')

elif(ans=='text file'):
  while(1):
    try:
      uploaded=files.upload()
      filename=next(iter(uploaded))
      search_agent=SearchAgent.add_text(input_files=[filename])
      ans=input('your question here:')
      response=search_agent.query(ans)
      print(response)
      break
    except:
      print('please enter the correct text file')

elif(ans=='youtube video'):
  from youtube_transcript_api import YouTubeTranscriptApi
  video_id=input('enter youtube video link:')[33:]
  transcript=YouTubeTranscriptApi.get_transcript(video_id)
  search_agent=SearchAgent.add_youtube(transcript)
  #while(1):
    #try:
      #search_agent=SearchAgent.add_youtube(input('enter youtube video link:') )
  ans=input('your question here:')
  result=search_agent.query(ans)
  print(result)
      #break
    #except:
      #print('please enter the proper youtube video link:')

elif(ans=='website'):
  import nest_asyncio
  nest_asyncio.apply()
  while(1):
    try:         
      search_agent=SearchAgent.add_website(input('enter website link:'))
      ans=input('your question here:')
      result=search_agent.query(ans)
      print(result)
      break
    except:
      print('please enter the proper website link:')

elif(ans=='webpage'):
  import nest_asyncio
  nest_asyncio.apply()
  while(1):
    try:
      search_agent=SearchAgent.add_webpage(input('enter webpage link:'))
      ans=input('your question here:')
      result=search_agent.query(ans)
      print(result)
      break
    except:
      print('please enter the proper webpage link:')


hey @jdepoix i am sorry to raise this ticket again although its has been closed. But I can't comprehend the previous solution.
So it would be really grateful if you'd help me and correct wherever I had gone wrong. Thanking in advance.
Asentient commented 5 days ago

Could this, be related to this error by any chance?

https://github.com/jdepoix/youtube-transcript-api/issues/299

jdepoix commented 3 days ago

Hi @josondev,

First of all, your issue description contained your OpenAI API Key. I edited it to remove it. Please be more careful about this in the future!

It seems that your issues regards the video qcphMpZ4sU. If I open it on YouTube it says "Video is not available", which explains why you can't get transcripts for it.