Open Linux2010 opened 1 month ago
Indeed this seems to be a pathological video where almost all video formats fail on the first fragment and 299 may fail later, regardless of Python 2.7/3.5/3.9 and User-Agent settings.
yt-dlp 2024.08.06 still works, apparently. It has fancy networking that we can't easily replicate: maybe punt to curl for all requests?
@dirkf the reason why yt-dlp works is because yt-dlp has abandoned the WEB
client entirely. The potoken experiment rate has reached 100% for WEB
/ rollout is complete. TVHTML5_SIMPLY_EMBEDDED_PLAYER
is reportedly the next victim, and I don't think we are currently able to detect the potoken requirement for that client
I have that with every single video I try. Curiously enough format '18' work all the time. Other formats that work are '136', '137', '248' and/or '160', but it depends on video - not always the case. Still, format '18' is the most reliable to work.
Yeah, format 18 (the last remaining "legacy" format) is unaffected, as are the m3u8 formats
So
poToken
(I agree, this is always being detected today) randomly breaks the download rather than uniformly giving 403 as with the revised n-sig "throttling"?itag
? WEB
is fully potoken'd and TVHTML5_SIMPLY_EMBEDDED_PLAYER
is reportedly partially potoken'd now via Botguard, and ANDROID
is fully potoken'd via Droidguard. Also, just because it has the same itag doesn't necessarily mean it's the same format. e.g. DRC audio formats; WEB
VR formats and TVHTML5
VR formats have different resolutions despite having the same itagsCan confirm that a lot of the video-only formats are just being 403-ed in the middle with their downloads, resulting in me getting files that stop after about 10-20 minutes into the video, but still have full sized audio.
By now I have written something into my scripts to just pick format 18 as long as a flag is set, because i foresee this issue happening again in the future once it is eventually fixed... >.>
So has anyone tried fetching fragments in fragments of <1MB? We already had a work-around to download in fragments to avoid throttling IIRC.
Otherwise:
poToken
and use the existing "punt to API" logic with a selected unafflicted client.Apparently the latest fix worked for not even a day, that doesn't bode well. Personally I keep getting "giving up after 0 fragment retries" in my python stuff.
From what I read in yesterdays thread, it seems like this will just not work out with fake JS interpretation if they try to combat this in the slightest. Like, that almost doesn't deserve the name attack vector, that's an attack landscape.
So has anyone tried fetching fragments in fragments of <1MB? We already had a work-around to download in fragments to avoid throttling IIRC.
I've tested it with http chunk size and with the pseudo-dash fragmentation and both resulted in the 403 errors.
This change is significant. I checked old, pre quantum Firefox and videos don't work any more, when 3 days ago they did.
Maybe the new player JS uses some G JS syntax extension (aka ECMA2021+) that hadn't been contemplated in those FF versions. Is there an error in the JS console?
It used to work as embedded or as mobile (when used mobile user agent). Now all of them display all saying error:
An error occurred. Please try again later. (Playback ID: j-bZsC_YehYVyZZ8)
Learn More (https://support.google.com/youtube/?p=player_error1&hl=en)
Loading any video at https://www.youtube.com/embed/1234567890a:
Content Security Policy: Couldn't process unknown directive 'require-trusted-types-for' <unknown>
mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create www-embed-player.js:26:77
InvalidStateError www-embed-player.js:1128:42
Then pressing play:
Error: WebGL: Error during native OpenGL init. base.js:11283:169
Error: WebGL: WebGL creation failed. base.js:11283:169
ED.
If it's of any help, despite what was said before, there are some videos that work.
First - this one doesn't, and gives following console log
Content Security Policy: Couldn't process unknown directive 'require-trusted-types-for' <unknown>
mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create www-embed-player.js:26:77
InvalidStateError www-embed-player.js:1128:42
Empty string passed to getElementById(). zVhcVoOEv7o line 2 > eval:795:28944
Error: WebGL: getParameter: parameter: invalid enum value <enum 0x9246> base.js:11283:254
This one does and, with this log:
Content Security Policy: Couldn't process unknown directive 'require-trusted-types-for' <unknown>
mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create www-embed-player.js:26:77
InvalidStateError www-embed-player.js:1128:42
Empty string passed to getElementById(). KAR4fAX5T7Y line 2 > eval:4650:33869
So, after clicking 'play' it gives this error: Error: WebGL: getParameter: parameter: invalid enum value <enum 0x9246> base.js:11283:254
.
No matter what chunk size I use I'm seeing hard 403 errors at 1Meg as others have reported - I'm able to download as many fragments as I want up to 1Meg and then get a 403.
Have experimented with generating cpn
s (nonces) and adding them to the format fragment URLs without any luck as well as using rn
(request numbers) in the URL query instead of byte ranges. Have tried sleeping between fragments to mimick video playback also without joy.
It feels like they've added a check somewhere which fails at the 1Meg mark but I haven't found anything yet where that might be.
Checking via the browser I can see that youtube is happily downloading /videoplayback
fragment URLs above 1Meg without any issue...
But in the browser the media links have the pot
parameter with its poToken
challenge result, no? Which is what we can't haz.
In line with step 1 above, I'm gradually pulling stuff from the yt-dlp extractor, enough to download HLS with client ios
, but plainly not yet enough to get unblocked links from tv
or web_creator
, eg with format 135. Should I be expecting that?
But in the browser the media links have the
pot
parameter with itspoToken
challenge result, no? Which is what we can't haz.In line with step 1 above, I'm gradually pulling stuff from the yt-dlp extractor, enough to download HLS with client
ios
, but plainly not yet enough to get unblocked links fromtv
orweb_creator
, eg with format 135. Should I be expecting that?
I'm not seeing a pot
parameter in the query strings, I am seeing post data in the /videoplayback
requests which is referred to in the source as playbackCookie
Edit: Looks like the playbackCookie / POST data is extracted from the bytes of the previous fragment response somehow
This is the procedure that I am using in my own code.
Load https://www.youtube.com/embed/<id#> and find the base.js link. Do the usual to extract the sig and n-sig. Extract the signatureTimestamp for the next step.
Load https://www.youtube.com/youtubei/v1/player with the signatureTimestamp and TVHTML5_SIMPLY_EMBEDDED_PLAYER as the client name.
If the JSON response contains "formats" and/or "adaptiveFormats" then we're good. This covers most videos, including age-gated ones. The 403 problem occurs when we have go to the next step. We can't use "www.youtube.com". We must use "m.youtube.com" with the user agent set to something like "Mozilla/5.0 (Android 14)" which is what I'm using.
Load https://m.youtube.com/watch?v=<id#> and extract the JSON structures that you would otherwise have gotten from the previous step.
And that's it. The extra step is only required for videos that disallow embedding.
Please don't bother to supply any "me too" reports unless the log shows some novelty that may help with rectification. Just "Like", or whatever, an existing similar report.
You can see how a poToken
is being sent in POST
data by the browser in the Invidious code that shows how to capture the value. But I understood from yt-dlp discussions that a pot
query parameter was used in the media links associated with the pot-ified session.
@8ChanAnon's algorithm is what is currently done for age-gate videos, up to the last step with m.youtube.com
which is new and interesting. What happens if you skip straight to that step?
Step 2 will only work if TVHTML5_SIMPLY_EMBEDDED_PLAYER
is not pot-ified, and that seems to be in question.
Indeed, Android 14/FF 122 at m.youtube.com didn't list the poToken
experiment IDs although yt-dlp has reported unsuccess with Android clients.
I don't think it's worth trying to get around the poToken, it will eventually be required in all clients.
I keep digging into base.js
when I get some time trying to understand how the token is created, it does seem to be extracted from the bytes of at least the first video fragment as far as I can tell, but not all fragments?...
There's a Uint8Array
which appears to be the fragment response data?... manipulated several times and then 82/84/68 bytes of that array are stored as playbackCookie
which is then sent in the POST data
At least it would be good to have a program that is not not-youtube-dl while a long term solution to the twattery is being investigated.
@dirkf
yt-dlp 2024.08.06 still works, apparently. It has fancy networking that we can't easily replicate: maybe punt to curl for all requests?
On a lot of websites I want to download from youtube-dl and curl dont even get the correct html, instead of the one i would get in my Browser , they receives a version that has the captcha.
Instead I have a bash-script that predownloads the non-captcha html via "https://github.com/lwthiker/curl-impersonate" (runs in docker container , i use tag: 0.5.2-ff-alpine)
AFAIK lwthiker/curl-impersonate is the only http client that completly impersonates an actual browser like firefox. A lot of the problems I had with 403 errors in youtube-dl where captchas triggered by http-client not being exactly like an official version of firefox or chrome :) (this might even be valid for fragment-downloads)
Yes, but so far as captcha is generally understood (G/recaptcha, hacaptcha, Cloudflare challenge aka breaks the Web), that is not the problem. Even if it solved the poToken
issue, a dependency running under Docker would not be an acceptable solution for the main functionality of the program, though it might be a PoC for a solution.
Yeah not relevant here but I did bookmark it for other things, looks like a decent tool :)
I only said that i run it in docker. Apparently it can be used as a library, see https://github.com/lwthiker/curl-impersonate?tab=readme-ov-file#Advanced-usage . Though I have not looked into that as I run youtube-dl and almost any other apps via docker anyway :)
Fair enough, but even curl-impersonate is quite a beefy dependency that would not be supportable on the same range of targets as the current yt-dl.
Fair enough.
What about this? : add an option to use it as an external downloader (--external-downloader) (Currently supports aria2c,avconv,axel,c url,ffmpeg,httpie,wget) and find some good presets for it (e.g. which specific browser it should impersonate)
@PatrickJRed it still wouldn't help with this situation, but it's pretty simple to add another curl downloader, you should open another issue or make the PR yourself
Indeed,
Android 14/FF 122
at m.youtube.com didn't list the poToken experiment IDs although yt-dlp has reported unsuccess with Android clients.
m.youtube.com / mobile browser would be the MWEB
client, not the ANDROID
client (which is the Youtube Android app).
Apparently the API fallback in the YT extractor wouldn't have worked for ages, if at all, because (unlike in the age-gate fallback) no sts
was being sent. Then:
$ python -m youtube_dl -v -f 135 'lLSkbZ3-EOs'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-f', u'135', u'lLSkbZ3-EOs']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: c5098961b
[debug] Python 2.7.18 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial - OpenSSL 1.1.1w 11 Sep 2023 - glibc 2.15
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[youtube] lLSkbZ3-EOs: Downloading webpage
WARNING: [youtube] Ignoring initial response with broken formats (poToken experiment detected)
[youtube] lLSkbZ3-EOs: Downloading player 28fd7348
[youtube] lLSkbZ3-EOs: Downloading API (WEB_CREATOR-2.20240726.00.00) JSON
[debug] [youtube] Decrypted nsig WNiuqfCxMStm3Y-S5 => zjd8WoLzKO-kpg
[debug] [youtube] Decrypted nsig W0_Kqkc3K5-gAlx82 => _9t8_-AhZvi04A
[debug] Invoking downloader on u'https://rr5---sn-cu-aigss.googlevideo.com/videoplayback?sparams=expire%2Cei%2Cip%2Cid%2Caitags%2Csource%2Crequiressl%2Cxpc%2Cbui%2Cspc%2Cvprv%2Csvpuc%2Cmime%2Cns%2Crqh%2Cgir%2Cclen%2Cdur%2Clmt&ei=SpK4Zs79DaqMp-oP2rj0-Qs&ip=46.208.6.25&clen=13106623&spc=Mv1m9rGN544RnSCiFwx6ZtiSyBYnBV85XTHGPMke3rPeUihI_jsi&id=o-AAzf1sFMtrrt-18dNqW2pVet7n-fVaH_H64E4djwP--H&txp=5535434&svpuc=1&aitags=133%2C134%2C135%2C136%2C160%2C242%2C243%2C244%2C247%2C278%2C298%2C299%2C302%2C303%2C394%2C395%2C396%2C397%2C398%2C399&gir=yes&xpc=EgVo2aDSNQ%3D%3D&requiressl=yes&keepalive=yes&source=youtube&mv=m&sig=AJfQdSswRQIhALKafKC8aHa08g0RPY6BpWA3m1oYlGDfGaHRvQhr_5UuAiAN533PHmqD5BpJaEfqhH4DlmLUg-b1zL4-Sin1oCuEnw%3D%3D&pcm2cms=yes&dur=850.966&ns=qhp8TEY1pHqt6L7-RadKU7MQ&initcwndbps=1367500&vprv=1&lsig=AGtxev0wRQIhAJ-ZhbXFfnM7SBC5y4SDngAzL5uMdn_hcmeqhSu3fInIAiAHnzyPjeapdCqUpj7Nr2l3GwYhTBrA4q4N85HNsl1i4g%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpcm2cms%2Cpl%2Cinitcwndbps&lmt=1723149328860953&c=WEB_CREATOR&sefc=1&bui=AQmm2eywWXlCZZ-0GdPj3i0R2tAb5WWt4NMpDpw7oUPM3vAdsNyDt7MuInx45YsZj7Ekmcd6Cy-NCGb1&mime=video%2Fmp4&fvip=4&rqh=1&itag=135&mm=31%2C29&mn=sn-cu-aigss%2Csn-cu-c9id&mh=jQ&n=_9t8_-AhZvi04A&mt=1723371671&expire=1723393706&pl=25&ms=au%2Crdu&mvi=5'
[dashsegments] Total fragments: 2
[download] Destination: 全球金融大动荡,日本加息背刺美国,中国躺赢?【汤山老王】-lLSkbZ3-EOs.mp4
[download] 100% of 12.50MiB in 00:10
$
I'm also experiencing this error, but when i changed the user-agent to:
"Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile Safari/537.36" (mobile user agent of the latest chrome version) it worked. Here is the exact line:
youtube-dl --verbose -x --user-agent "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile Safari/537.36"
I'm also experiencing this error, but when i changed the user-agent to: "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile Safari/537.36" (mobile user agent of the latest chrome version) it worked. Here is the exact line:
youtube-dl --verbose -x --user-agent "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile Safari/537.36"
can confirm this works for me (youtube-dl proxy through tor-browser-bundle exit=de)
Yes, because specifying a mobile UA redirects to m.youtube.com as above. Mozilla/5.0 (Android 14; Mobile; rv:115.0) Gecko/115.0 Firefox/115.0
is a shorter UA that has the same effect.
However the ytInitialData
in the mobile page is stringified JSON rather than actual JSON as in the desktop page, and this causes an additional API call (I suspect that no valid data is returned).
Yes, because specifying a mobile UA redirects to m.youtube.com as above.
Mozilla/5.0 (Android 14; Mobile; rv:115.0) Gecko/115.0 Firefox/115.0
is a shorter UA that has the same effect.
thx for the shorter version
However the
ytInitialData
in the mobile page is stringified JSON rather than actual JSON as in the desktop page, and this causes an additional API call (I suspect that no valid data is returned).
which API - to yt-server or internal json-thingy or where ? can you point me to the appropriate section of code - I may can help (I wrote some extractors for my servers - though they are on my private repo as they contain logins/secrets for my servers - so i wont share thoose of course) (I also modified some of your functions/lib files which I link in docker for specific calls - so I can dig around and try stuff :) )
Decoding the stringified JSON is no problem:
def _extract_yt_initial_variable(self, webpage, regex, video_id, name):
result = self._search_json(
regex, webpage, name, video_id, default={},
contains_pattern=r'(?:\{[\s\S]+}|(?P<_q>"|\')(?:(?!(?P=_q))[\s\S])+(?P=_q))',
end_pattern=r';\s*%s' % (self._YT_INITIAL_BOUNDARY_RE,),
transform_source=lambda s: self._parse_json(
s, video_id, transform_source=js_to_json, fatal=False) if s[:1] in '\'"' else s)
return result
I'll put up a PR once I've checked the playlist extraction.
I would like to mention that --match-filter for duration suddenly does not do its job anymore
even with -v i only get this:
[youtube] ySV2v5RNu4Q: Downloading webpage [youtube] ySV2v5RNu4Q: Downloading player 53afa3ce
before it goes to the next playlist element, it does not even tell me that it rejected it because of not matching the filter, it just skips over it, meaning something with whichever video it partially downloads to determine its length is broken to some extent. I say that here because I believe it is related to the same stupid Issue of only legacy formats being available for weeks now.
The Video itself IS downloadable with format 18 when the duration filter is not added.
Edit: will open dedicated Issue for more indepth should it be deemed necessary.
The mobile UA work-around could cause metadata to be lost. Even if that doesn't seem to be relevant, please check again with the real-soon-now PR and post any appropriate diagnostics there.
Will this issue ever be fixed or is this the end of youtube-dl? I'm asking because I need to know if I should switch to another tool or if I should wait. Thanks.
This project seems to have slowed down a lot. @dirkf is overwhelmed by issues and a lot of features and fixes have yet to be backported.
I think if you are not a legacy system user, it might be better to use yt-dlp
instead. Otherwise you will have to wait.
Unfortunately, the yt-dlp stage is worse than that of this project. any 'left-right' steps and you have broken code. any support as well. so no
yt-dlp is a fork of youtube-dl. They are already very similar, but if you don't like some of the modernizations, you can revert them with --compat-options youtube-dl
I have no idea what you mean with the broken code. Generally, yt-dlp is more reliable than youtube-dl due to having thousands of fixes and improvements committed over past three years that are not part of youtube-dl.
Though, this is getting off-topic here. If you have any issues or questions regarding yt-dlp, you should open an issue there.
Will this issue ever be fixed or is this the end of youtube-dl? ...
At the moment, the work-arounds suggested above have to be used.
RSN, though, this will be in a PR:
$ pytest -k test_Youtube============================= test session starts ==============================
platform linux2 -- Python 2.7.18, pytest-4.6.11, py-1.11.0, pluggy-0.13.1
rootdir: /home/df/Documents/src/youtube-dl
collected 3962 items / 3853 deselected / 109 selected
test/test_YoutubeDL.py ........................... [ 24%]
test/test_YoutubeDLCookieJar.py ..... [ 29%]
test/test_download.py ..s..........s...s.....s..............s...sssss.s. [ 75%]
..s.ss.s..s......s.....s... [100%]
=========== 91 passed, 18 skipped, 3853 deselected in 576.45 seconds ===========
$
The workaround with changing user agent unfortunately doesn't work for me anymore
Is there any other evidence that MWEB
is broken? Verbose log, please.
build from source from c5098961b04ce83f4615f2a846c84f803b072639
Your log is not using a mobile UA. Also, passing cookies seems to be deprecated (see the yt-dlp links).
The workaround with changing user agent unfortunately doesn't work for me anymore
This just worked for me:
$ python3 -m youtube_dl -f mp4 --verbose -x --user-agent "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile Safari/537.36" 'somevideo'
sory build from source from c5098961b04ce83f4615f2a846c84f803b072639
Just confirming, my Script still works and downloads Videos without needing to change things up.
Still does use the legacy video formats, which kills my mobile based home data plan though, so I really hope there is going to be a fix so i can use 240p on the longer vods again. (not rushing you, because i know how complicated this stuff can be, but I'm annoyed at youtube itself for causing this in the first place)
@3052 It looks like your code uses oauth? For now, ANDROID still works with authentication (either auth or po_token is required). It's not known how much longer it will work this way, though. PO token has been a gradual rollout and I'd expect it to be required for all clients eventually
Checklist
Verbose log
Description