atoonk commented 3 months ago

To Reproduce

using youtube-transcript-api-0.6.2:

cat test.py 
from youtube_transcript_api import YouTubeTranscriptApi

print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))

outputs:

python3 ./test.py 
Traceback (most recent call last):
  File "/root/border0-plugin/./test.py", line 3, in <module>
    print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 137, in get_transcript
    return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch(preserve_formatting=preserve_formatting)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts
    return TranscriptListFetcher(http_client).fetch(video_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch
    self._extract_captions_json(self._fetch_video_html(video_id), video_id),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json
    raise TranscriptsDisabled(video_id)
youtube_transcript_api._errors.TranscriptsDisabled: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=w8rYQ40C9xo! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!

What code / cli command are you executing?

I am running

from youtube_transcript_api import YouTubeTranscriptApi
print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))

Which Python version are you using?

Python 3.11.6

Which version of youtube-transcript-api are you using?

youtube-transcript-api-0.6.2

Expected behavior

Describe what you expected to happen. I expected to receive the english transcript I can see it in browser, see screenshot:

Actual behaviour

Traceback (most recent call last):
  File "/root/border0-plugin/./test.py", line 3, in <module>
    print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 137, in get_transcript
    return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch(preserve_formatting=preserve_formatting)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts
    return TranscriptListFetcher(http_client).fetch(video_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch
    self._extract_captions_json(self._fetch_video_html(video_id), video_id),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json
    raise TranscriptsDisabled(video_id)
youtube_transcript_api._errors.TranscriptsDisabled: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=w8rYQ40C9xo! This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!

OpeyemiSanusi commented 2 months ago

@jdepoix

Just curious - do you know what's the age limit for cookies?

there is no right answer to be honest - depends on Youtube or the site, could last anywhere from a day to whatever. Just watch your program closely or build a failsafe around such failures.

iamscottweber commented 2 months ago

I have it working with a secure proxy without cookies and the fork that @danielsanmartin provided. I'll watch it and possibly incorporate cookies.

meera commented 2 months ago

I experienced the same issue, looks like YouTube is blocking IPs. Mine is in AWS EC2. I have a Cloudflare Worker that does the job for now: https://github.com/jamesflores/youtube-subtitles-worker

@jamesflores I thought about doing the same. I am worried that I will get my Cloudflare account in trouble. We might run into the same issue a few months down the line when YouTube blocks the Cloudflare worker IP address. What do you think?

sarun-chuttakula commented 2 months ago

My code using YouTubeTranscriptApi works locally but fails on the server with this error: Failed to retrieve transcript: Subtitles are disabled for this video I’ve confirmed the subtitles are available and have the same library version on both environments. I also had trouble with proxy settings. Any suggestions or solutions would be appreciated!

satyajit-bagchi commented 2 months ago

Thanks for creating this thread, I was tearing my hair out to figure out what the hell happened to my AWS lambda function!

So following the suggestions from before, In prod: I tried using a proxy when fetching the transcript (caveat: http proxy), but I still get the same "TranscriptsDisabled" error.

Locally, it works fine. Any idea what this could be about? How is it technically possible that Youtube bans an ip address proxy that works locally but not on AWS servers?

Appreciate any insight you guys might have

udede11 commented 2 months ago

I had exactly the same issue on prod with lambda functions. It could be something that lambda is putting on the header that gives away it’s an AWS lambda and they might have banned all Lambdas using that. But this is completely a speculation, haven’t tested it yet.

I used the rapid api solution presented above. Works good for now 👍

On Sun, Aug 11, 2024 at 2:07 PM Satya @.***> wrote:

Thanks for creating this thread, I was tearing my hair out to figure out what the hell happened to my AWS lambda function!

So following the suggestions from before, In prod: I tried using a proxy when fetching the transcript (caveat: http proxy), but I still get the same "TranscriptsDisabled" error.

Locally, it works fine. Any idea what this could be about? How is it technically possible that Youtube bans an ip address proxy that works locally but not on AWS servers?

Appreciate any insight you guys might have

— Reply to this email directly, view it on GitHub https://github.com/jdepoix/youtube-transcript-api/issues/303#issuecomment-2282720324, or unsubscribe https://github.com/notifications/unsubscribe-auth/A62HCCVWR7O6PFIV5RPIHZTZQ5AWLAVCNFSM6AAAAABLBOWBE6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBSG4ZDAMZSGQ . You are receiving this because you commented.Message ID: @.***>

jjwjj commented 2 months ago

Thanks a lot @iamscottweber!

Interestingly, when rendering the HTML dump it includes the error message "Sign in to confirm you're not a bot". So this means that you might actually be able to continue scraping if you're signed in! You can do authenticated requests using Cookies, as explained in the README.

Could maybe someone who's currently blocked give this a try and see whether this allows them to continue scraping?

(Please note that I don't know if YouTube will ban your account at some point if you scrape too much, so it might be better to do this with an account you don't care about, just to be on the safe side)

@jdepoix

I ran your code on my aws ec2 linux instance and generated a dump.html

I then opened the html on my local workstation and got the "sign in". Before I signed in, the transcript was not accessible. After the sign-in the transcript was. So in essence that part of the test worked. I wanted to then try using your cookie methodology, however the extension you have listed in your readme is no longer available on chrome. Do you have another extension you would recommend, or perhaps another way to get to the cookie info so I can continue testing the methodology?

sarun-chuttakula commented 2 months ago

I'd like to understand how the YouTubeTranscript website obtains transcript support, as it must be deployed somewhere to provide this functionality.

sarun-chuttakula commented 2 months ago

@udede11 The youtube rapidapi website only supports up to 150 requests per month, so I need a permanent solution for obtaining transcripts consistently.

Joe-hitthecode commented 2 months ago

I have been running the youtube transcript api for my startup over months. We solved the transcript disabled problem long time ago and we have wrote some script in house that make sure that it never break on us. If you are interested in this solution you can reach out to me joeslamie@gmail.com

meera commented 2 months ago

I have been running the youtube transcript api for my startup over months. We solved the transcript disabled problem long time ago and we have wrote some script in house that make sure that it never break on us. If you are interested in this solution you can reach out to me joeslamie@gmail.com

Send you an email!

sarun-chuttakula commented 2 months ago

Hi @Joe-hitthecode,

Just a quick note—I’ve emailed you at joeslamie@gmail.com regarding the YouTube Transcript API solution. Looking forward to your response!

AniketModi commented 2 months ago

Hi @Joe-hitthecode,

Just a quick note—I’ve emailed you at joeslamie@gmail.com regarding the YouTube Transcript API solution. Looking forward to your response!

Can you pls email to me as well. I am also currently using same API and facing same issue while running the code on AWS EKS but working well when tried to run on my local.

Joe-hitthecode commented 2 months ago

lol Meera. It is not a Scam. This is my LinkedInd: https://www.linkedin.com/in/joe-georgeo-slamie-413b7a170/. I am also busy so can't reply very fast as you will need

sarun-chuttakula commented 2 months ago

can you reply me @Joe-hitthecode

Joe-hitthecode commented 2 months ago

I am have a lot of emails. I am preparing a general response for everyone. Like that I don't reply people individually. Part of the solution include firstly trying to use proxy, which I mentioned here before.

ClementCloud commented 2 months ago

@Joe-hitthecode You are going to post the solution here?

Joe-hitthecode commented 2 months ago

I am going off my system for a while so let me just go over how we are managing this expected problem. Before I go into it I just want to put it out there that the api we are using is basically employing web scrapping and btw these solutions are bound to be troublesome, however the biggest way we handled it was to use proxy but in a more skillful way. Before during anything else try a proxy that is easy to configure like https://nodemaven.com/ after you get a practical understanding of how to configure proxy then you can move on to using free proxy like https://www.croxyproxy.com/ which is a little complex. If you are successful with these steps, use the domain you get from your proxy provider like this:

proxy_url = f'http://{username}:{password}@{proxy_host}:{proxy_port}' transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=lang, proxies={'http': proxy_url, 'https': proxy_url})

If you cloned the repository and looked at the source code you will see that the place where the error is raised is in the _errors.py file: specifically this error message: class TranscriptsDisabled(CouldNotRetrieveTranscript): CAUSE_MESSAGE = 'Subtitles are disabled for this video' - When you called the .get_transcript method it creates a TranscriptListFetcher class and calls the .fetch method, below in the source code

class TranscriptListFetcher(object): def init(self, http_client): self._http_client = http_client

def fetch(self, video_id):
    return TranscriptList.build(
        self._http_client,
        video_id,
        self._extract_captions_json(self._fetch_video_html(video_id), video_id),
    )

def _extract_captions_json(self, html, video_id):
    splitted_html = html.split('"captions":')

    if len(splitted_html) <= 1:
        if video_id.startswith('http://') or video_id.startswith('https://'):
            raise InvalidVideoId(video_id)
        if 'class="g-recaptcha"' in html:
            raise TooManyRequests(video_id)
        if '"playabilityStatus":' not in html:
            raise VideoUnavailable(video_id)

        raise TranscriptsDisabled(video_id)

    captions_json = json.loads(
        splitted_html[1].split(',"videoDetails')[0].replace('\n', '')
    ).get('playerCaptionsTracklistRenderer')
    if captions_json is None:
        raise TranscriptsDisabled(video_id)

    if 'captionTracks' not in captions_json:
        raise NoTranscriptAvailable(video_id)

    return captions_json

The .fetch method calls the self._extract_caption_json method and there is where the error is. The error is raised when these three conditions are checked and programmatically when these conditions doesn't failed the code illusively raise the TranscriptDisabled method which in some cases could be because of some other reason other than transcript disabled. However the reason this is happening is because your IP is blocked or something. So there is no magic bullet or secret sauce, you have to use a proxy or some type of vpn service that hide away your ip before fetching the transcript data.

ps: My Backend is running on pythonanywhere.com

jdepoix commented 2 months ago

@satyajit-bagchi you'll have to proxy your https requests, since the YouTube requests are done using https. Setting up a proxy for http won't do anything.

jdepoix commented 2 months ago

@Joe-hitthecode I'll try to add a more explicit exception for this type of error when I find time to do so. This should allow for catching them and falling back to a proxy or rotating through a pool of IPs as they get banned.

satyajit-bagchi commented 2 months ago

Thanks for pointing that out @jdepoix :). I now have it working for me on the cloud

To everyone else: For those in the starting stages of their projects, webshare offers a free 10 proxies with the SOCKS5 protocol. You can use a socks5 proxy out of the box with youtube-transcript-api. Just pass it to the proxies dict: https://stackoverflow.com/questions/12601316/how-to-make-python-requests-work-via-socks-proxy https://github.com/jdepoix/youtube-transcript-api?tab=readme-ov-file#proxy

nightmare233 commented 2 months ago

I am facing the same issue. It is not working in the cloud server.

meera commented 2 months ago

I am facing the same issue. It is not working in the cloud server.

You need to signup with a proxy to fix this.

Joe-hitthecode commented 2 months ago

If you are using the SOCKS5 protocol, make sure to install pysocks. pip install pysocks and you are good to go

pchoudhari commented 2 months ago

Thanks for pointing that out @jdepoix :). I now have it working for me on the cloud

To everyone else: For those in the starting stages of their projects, webshare offers a free 10 proxies with the SOCKS5 protocol. You can use a socks5 proxy out of the box with youtube-transcript-api. Just pass it to the proxies dict: https://stackoverflow.com/questions/12601316/how-to-make-python-requests-work-via-socks-proxy https://github.com/jdepoix/youtube-transcript-api?tab=readme-ov-file#proxy

Excellent, worked.

Though still evaluating using the captions api with the Oauth directly, https://[developers.google.com/resources/api-libraries/documentation/youtube/v3/python/latest/youtube_v3.captions.html](https://developers.google.com/resources/api-libraries/documentation/youtube/v3/python/latest/youtube_v3.captions.html)

mohammad-yousuf commented 2 months ago

Thanks for pointing that out @jdepoix :). I now have it working for me on the cloud

To everyone else: For those in the starting stages of their projects, webshare offers a free 10 proxies with the SOCKS5 protocol. You can use a socks5 proxy out of the box with youtube-transcript-api. Just pass it to the proxies dict: https://stackoverflow.com/questions/12601316/how-to-make-python-requests-work-via-socks-proxy https://github.com/jdepoix/youtube-transcript-api?tab=readme-ov-file#proxy

@satyajit-bagchi its not working on Azure.

brdemorin commented 2 months ago

I ran into this on google cloud run. I tried dataimpulse with a $1/GB pay-as-you-go plan and it worked for me:

transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": f"https://{dataimpulse_login}:{dataimpulse_password}@gw.dataimpulse.com:823"})

Not 100% reliable so failover path to smartproxy mentioned above by @SKVNDR

sskinner-quizlet commented 2 months ago

We've noticed this issue cropping up more in the past week, but interestingly it is not happening for all videos. Sometimes the same video will fail, and then succeed. Are others seeing this behavior? Does that still indicate that Youtube is blocking/rate limiting?

iamscottweber commented 2 months ago

Yes, I've noticed a similar behavior where the same video is blocked even with a proxy. It will sometimes fail and other times work as expected. But adding the proxy has helped immensely and I'm assuming the fails happen to be on an IP address that is already rate limited.

zartinn commented 2 months ago

I am going off my system for a while so let me just go over how we are managing this expected problem. Before I go into it I just want to put it out there that the api we are using is basically employing web scrapping and btw these solutions are bound to be troublesome, however the biggest way we handled it was to use proxy but in a more skillful way. Before during anything else try a proxy that is easy to configure like https://nodemaven.com/ after you get a practical understanding of how to configure proxy then you can move on to using free proxy like https://www.croxyproxy.com/ which is a little complex. If you are successful with these steps, use the domain you get from your proxy provider like this:

proxy_url = f'http://{username}:{password}@{proxy_host}:{proxy_port}' transcript = YouTubeTranscriptApi.get_transcript(video_id, languages=lang, proxies={'http': proxy_url, 'https': proxy_url})

If you cloned the repository and looked at the source code you will see that the place where the error is raised is in the _errors.py file: specifically this error message: class TranscriptsDisabled(CouldNotRetrieveTranscript): CAUSE_MESSAGE = 'Subtitles are disabled for this video' - When you called the .get_transcript method it creates a TranscriptListFetcher class and calls the .fetch method, below in the source code

class TranscriptListFetcher(object): def init(self, http_client): self._http_client = http_client
def fetch(self, video_id):
    return TranscriptList.build(
        self._http_client,
        video_id,
        self._extract_captions_json(self._fetch_video_html(video_id), video_id),
    )

def _extract_captions_json(self, html, video_id):
    splitted_html = html.split('"captions":')

    if len(splitted_html) <= 1:
        if video_id.startswith('http://') or video_id.startswith('https://'):
            raise InvalidVideoId(video_id)
        if 'class="g-recaptcha"' in html:
            raise TooManyRequests(video_id)
        if '"playabilityStatus":' not in html:
            raise VideoUnavailable(video_id)

        raise TranscriptsDisabled(video_id)

    captions_json = json.loads(
        splitted_html[1].split(',"videoDetails')[0].replace('\n', '')
    ).get('playerCaptionsTracklistRenderer')
    if captions_json is None:
        raise TranscriptsDisabled(video_id)

    if 'captionTracks' not in captions_json:
        raise NoTranscriptAvailable(video_id)

    return captions_json
The .fetch method calls the self._extract_caption_json method and there is where the error is. The error is raised when these three conditions are checked and programmatically when these conditions doesn't failed the code illusively raise the TranscriptDisabled method which in some cases could be because of some other reason other than transcript disabled. However the reason this is happening is because your IP is blocked or something. So there is no magic bullet or secret sauce, you have to use a proxy or some type of vpn service that hide away your ip before fetching the transcript data.

ps: My Backend is running on pythonanywhere.com

Jup thats exactly the problem I noticed. Having also problems on AWS. I mean youtube is returning something even with the videoDetails prop, but it is not complete as the crucial part is missing: playerCaptionsTracklistRenderer

So I guess the only solution is right now using proxy/VPN/dynamic IP or the official YT-api

brdemorin commented 2 months ago

I use yt-dlp as a failover to download the audio then send to my whisper server for transcription. It was failing as well until I added a proxy to it. Appears YouTube blacklisted all known IP ranges from providers

saranyak-16 commented 1 month ago

Thanks guys the proxy works. Should i implement proxy rotating logic?

aj-bei commented 1 month ago

I made a webshare account to get access to Socks5 proxies. I have tried multiple of their proxies and pip installed & imported pysocks as @Joe-hitthecode suggested and keep getting the following exception:

NOTE: I am running this all locally, not in cloud.

Exception message: SOCKSHTTPSConnectionPool(host='www.youtube.com', port=443): Max retries exceeded with url: /watch?v=kvlWtA136FM (Caused by NewConnectionError('<urllib3.contrib.socks.SOCKSHTTPSConnection object at 0x000001F1FFCEABA0>: Failed to establish a new connection: 0x04: Host unreachable'))

is anybody else experiencing this or does anyone know a solution?

sarun-chuttakula commented 1 month ago

@aj-bei I think Your not providing webshare credentials proxies proxies = { 'http': 'socks5://user:pass@proxy_ip:proxy_port', 'https': 'socks5://user:pass@proxy_ip:proxy_port' }

Define a function to fetch transcripts using proxies

def generate_video_transcript(video_id): try:

Fetch transcript with the YouTubeTranscriptApi

    response = YouTubeTranscriptApi.get_transcript(video_id, proxies=proxies)

if your alread doing it,try to use different proxy ip from proxy ip list which has status working

KumarLakshmanan commented 1 month ago

What is proxy? how to use this in ubuntu machine

terrencestella commented 1 month ago

Just ran across this issue today, glad I found this thread. I too am on Digital Ocean, running my code in a Docker container. Getting transcripts runs fine locally, but not on DO.

I would appreciate the video mentioned above, as proxies are new to me. If I use my localhost as a proxy, it means I need to leave the machine running 24/7 right? I mean, I guess that's obvious.

Did you create the Docker container yourself? I am interested in running this from my home server.

aj-bei commented 1 month ago

@aj-bei

I think Your not providing webshare credentials proxies

proxies = {
'http': 'socks5://user:pass@proxy_ip:proxy_port',

'https': 'socks5://user:pass@proxy_ip:proxy_port'
}

Define a function to fetch transcripts using proxies

def generate_video_transcript(video_id):
try:

    # Fetch transcript with the YouTubeTranscriptApi

    response = YouTubeTranscriptApi.get_transcript(video_id, proxies=proxies)
if your alread doing it,try to use different proxy ip from proxy ip list which has status working

I am not even running on the cloud and all my proxies still don't work

sarun-chuttakula commented 1 month ago

@aj-bei I think Your not providing webshare credentials proxies proxies = {
'http': 'socks5://user:pass@proxy_ip:proxy_port',

'https': 'socks5://user:pass@proxy_ip:proxy_port'
}

Define a function to fetch transcripts using proxies

def generate_video_transcript(video_id):
try:

    # Fetch transcript with the YouTubeTranscriptApi

    response = YouTubeTranscriptApi.get_transcript(video_id, proxies=proxies)
if your alread doing it,try to use different proxy ip from proxy ip list which has status working
I am not even running on the cloud and all my proxies still don't work

If your not using in cloud,you will get transcript directly using it in local without any problem.

algoworks-mayank commented 1 month ago

Hello @jdepoix , i am using webshare socks5 proxies

proxies = { 'http': f'socks5://{username}:{password}@{host}:{port}', 'https': f'socks5://{username}:{password}@{host}:{port}' }

transcript_list = YouTubeTranscriptApi.list_transcripts(video_id, proxies=proxies)

i am still getting this error on aws lambda

ERROR- SOCKSHTTPSConnectionPool(host='www.youtube.com', port=443): Max retries exceeded with url: /watch?v=WTOm65IZneg (Caused by NewConnectionError('<urllib3.contrib.socks.SOCKSHTTPSConnection object at 0x7fe5786f6510>: Failed to establish a new connection: All offered SOCKS5 authentication methods were rejected')) - WTOm65IZneg

vanyamlb commented 1 month ago

I use a tool that can download transcripts one by one from any channel with this script built in. It works for some time and then it stops, but right after I switch a country on my paid VPN it works again, then it stops loading again until I switch to another country and so on. After some time, it starts working with the same IPs for the limited amount of videos again. So if it does not work for you, make sure your proxy/VPN is actually working well

itarutomy97 commented 1 month ago

If you encounter SOCKSHTTPSConnectionPool error, try socks5h not socks5. I have succeed. https://stackoverflow.com/questions/12601316/how-to-make-python-requests-work-via-socks-proxy

corngk commented 1 month ago

I use VPN, it works for a while, but then got the same "subtitles are disabled for this video" again. I need transcripts for hundreds of videos.

OpeyemiSanusi commented 1 month ago

hundreds

@corngk is it a one time thing or is continuous. I have an automation that runs all you have to do is make a request to the link. Send me an email at opeyemisanusi@gmail.com

jjzhuo commented 1 month ago

To get this working reliably in production, adding a proxy layer is essential. If you need help, feel free to reach out: https://linktr.ee/clearcode

Ashes47 commented 1 month ago

I am hoping this won't get misused, But this is a working solution for free. https://gist.github.com/Ashes47/f03d8f8dfd024783a8a34ba34141d6ec

luizlevindiniz commented 1 month ago

Hi all, I've been facing the same problem. I'm using a AWS lambda to get transcriptions and the response is always the same: no subtitles for this video. I'll apply a residencial proxy from smartproxy and I'll bring the results here after.

luizlevindiniz commented 1 month ago

It worked like a charm!

vanyamlb commented 1 month ago

Can this solution be built in the tool itself as default? @jdepoix

mucahitkayadan commented 1 month ago

[Solved] I am using Digital Ocean and the problem still exists.

Edit: I set up Tor and it works now.

# general tor proxies
proxies = {
    'http': 'socks5h://127.0.0.1:9050',
    'https': 'socks5h://127.0.0.1:9050'
}

# I did not want to set up Tor on my local Windows
        video_id = self.extract_video_id(url)
        if platform.system() == 'Linux':
            transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies=proxies)
        else:
            transcript = YouTubeTranscriptApi.get_transcript(video_id)

pip install pysocks

sudo apt update
sudo apt install tor
sudo service tor start

Ashes47 commented 1 month ago

Problems with using TOR

No IP Rotation, YT can (and will) ban this tommorrow
Latency is gonna be bad at scale for sure with TOR

jdepoix / youtube-transcript-api

TranscriptsDisabled But it's not disabled (works locally, fails on Cloud machine) #303

To Reproduce

What code / cli command are you executing?

Which Python version are you using?

Which version of youtube-transcript-api are you using?

Expected behavior

Actual behaviour

Define a function to fetch transcripts using proxies

Fetch transcript with the YouTubeTranscriptApi

Define a function to fetch transcripts using proxies

Define a function to fetch transcripts using proxies