ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.52k stars 10.05k forks source link

Twitter Updates #32392

Open vidavie opened 1 year ago

vidavie commented 1 year ago

Checklist

Verbose log


REDACTED username ~ % youtube-dl -g https://twitter.com/1500tasvir/status/1577533879283585025 -v
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-g', 'https://twitter.com/1500tasvir/status/1577533879283585025', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.11.3 (CPython) - macOS-13.4.1-x86_64-i386-64bit
[debug] exe versions: none
[debug] Proxy map: {}
ERROR: Unable to download JSON metadata: HTTP Error 404: Not Found (caused by <HTTPError 404: 'Not Found'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/local/lib/python3.11/site-packages/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/youtube_dl/YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

## Grabbing URLs from Twitter not working. I assume due to their new crack-down on web scrapers?

<!--
Provide an explanation of your issue in an arbitrary form. Provide any additional information, suggested solution and as much context and examples as possible.
If work on your issue requires account credentials please provide them or explain how one can obtain them.
-->

Grabbing URLs from Twitter not working. I assume due to their new crack-down on web scrapers?
dirkf commented 1 year ago

https://github.com/yt-dlp/yt-dlp/issues/7473 says you now have to authenticate, which I guess means passing cookies from your logged-in browser session.

Vangelis66 commented 1 year ago

says you now have to authenticate, which I guess means passing cookies from your logged-in browser session.

... Thanks to bashonly πŸ‘ , "downstream" do have a plan B 😜 in the shape of https://github.com/yt-dlp/yt-dlp/pull/7476, which, having been tested here, delivers what it promises (unlike politicians 😞 ) :

python yt-dlp -vF "https://twitter.com/1500tasvir/status/1577533879283585025" => 

[debug] Command-line config: ['-vF', 'https://twitter.com/1500tasvir/status/1577533879283585025']
[debug] Encodings: locale cp1253, fs utf-8, pref cp1253, out utf-8 (No VT), error utf-8 (No VT), screen utf-8 (No VT)
[debug] yt-dlp version nightly@2023.07.02.193114 [8776349ef] (zip)
[debug] Python 3.7.16 (CPython x86 32bit) - Windows-Vista-6.0.6003-SP2 (OpenSSL 1.1.1s  1 Nov 2022)
[debug] exe versions: none
[debug] Optional libraries: sqlite3-2.6.0
[debug] Proxy map: {}
[debug] Loaded 1858 extractors
[twitter] Extracting URL: https://twitter.com/1500tasvir/status/1577533879283585025
[twitter] 1577533879283585025: Downloading syndication JSON
[twitter] Some metadata is missing without authentication. Use --cookies, --cookies-from-browser, --username and --password, --netrc-cmd, or --netrc (twitter) to provide account credentials
[debug] [twitter] Extracting from video info: 1577533837206241280
[twitter] 1577533879283585025: Downloading m3u8 information
[debug] Sort order given by extractor: res, br, size, proto
[debug] Formats sorted by: hasvid, ie_pref, res, br, size, proto, lang, quality, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, asr, vext, aext, hasaud, source, id
[info] Available formats for 1577533837206241280:
ID       EXT RESOLUTION |   FILESIZE  TBR PROTO | VCODEC      ACODEC
-----------------------------------------------------------------------
hls-256  mp4 480x270    | ~520.10KiB 256k m3u8  | avc1.4d001e mp4a.40.2
http-256 mp4 480x270    | β‰ˆ520.10KiB 256k https | unknown     unknown
hls-832  mp4 640x360    | ~  1.65MiB 832k m3u8  | avc1.4d001f mp4a.40.2
http-832 mp4 640x360    | β‰ˆ  1.65MiB 832k https | unknown     unknown
dirkf commented 1 year ago

As bashonly suggests in the yt-dlp PR, we should wait to see how the Muskified Twitter turns out before committing a new extractor.

8chanAnon commented 1 year ago

This will return a JSON file containing the video links:

https://cdn.syndication.twimg.com/tweet-result?id=1577533879283585025

faelana commented 1 year ago

This will return a JSON file containing the video links:

https://cdn.syndication.twimg.com/tweet-result?id=1577533879283585025

doesnt seem to work atm.

8chanAnon commented 1 year ago

doesnt seem to work atm.

Yeah. A token is now required. Try this:

https://cdn.syndication.twimg.com/tweet-result?id=1577533879283585025&token=422kr3t824n

dirkf commented 1 year ago

Is that a random value, or is there an algorithm for generating it?

BTW, the current yt-dlp code adds a User-Agent header with value Googlebot. I presume that's supposed to bypass this.

ghost commented 1 year ago

LOL, just realized the token can be any string:

https://cdn.syndication.twimg.com/tweet-result?id=1577533879283585025&token=!

8chanAnon commented 1 year ago

LOL, just realized the token can be any string:

LOL again. Why didn't I think of that? By the way, I see that you deleted your previous comment. Think before you jump, as they say.

8chanAnon commented 1 year ago

@1268

I didn't figure anything out. I got the link from the developer tool after running this link to get the publicly accessible version of the tweet:

https://platform.twitter.com/embed/Tweet.html?id=1577533879283585025

I posted what I found for others to try out. Further investigation would have been premature. As you now know. What did you expect from me? An explanation that I used developer tools to look at a tweet?

8chanAnon commented 1 year ago

@1268

That pathway is not needed to obtain the video. That's why I didn't mention it. It might be useful for the extraction of other information but I don't know what youtube-dl would want from it.

Looking through the current iteration of the extractor code, I see that most of it is basically junk now due to the changes that Musk has wrought. Getting a tweet video is easy with the pathway that I have given. Broadcasts still work with the guest token (for now). However, Audio Spaces won't work without login cookies. Should just focus on getting youtube-dl working with regular tweets. I've told you what you need to know so what are we arguing about?

8chanAnon commented 1 year ago

As if we haven't been relying on "magic links" all along. The extractors are full of magic links with no indication whatsoever of how they were uncovered in the first place. The previous magic link was rendered obsolete by Twitter. Now we have another magic link. If that stops working then somebody will find yet another magic link. This will happen without extensive explanation. Yeah, this is what-about-ism but you're complaining about something that has always been true. I don't see how that can change. Nobody ever explains anything in those extractors. That's why I rarely look at them. It's easier to just forge my own path. No, I don't contribute to youtube-dl (except for the occassional comment). I have my own tools.

8chanAnon commented 1 year ago

Are you on a mission? That's fine. I hope that this discussion proves useful to the community. Are you doing anything with the information that I've provided? Since nobody else is offering a solution then maybe I deserve a bit of thanks instead of a thrashing. LOL.

ghost commented 1 year ago

Are you on a mission?

its a simple request to provide context, when you have it. you can either honor the request or ignore it, up to you.

Are you doing anything with the information that I've provided? Since nobody else is offering a solution then maybe I deserve a bit of thanks instead of a thrashing.

I reversed the Android client this month:

https://github.com/1268/media/tree/v1.5.6/twitter

however it seems since the change from Twitter to X, a sign in is required, even on the Android app. so my method no longer works. but to be fair, I did discover the token can be hard coded in this thread, so I think thats worth something. I will probably implement your method as a replacement for my own, so thanks for that.

8chanAnon commented 1 year ago

You could try doing what I do. Make an account on Twitter. All you need is an email (use a throwaway). Log in to the account and never log out. Extract the cookies called "ct0" and "auth_token". Both cookies need to be sent along with the usual Bearer token. The request must also contain a header called "x-csrf-token" set to the exact same value as the "ct0" cookie. For prove that this works:

https://8chananon.github.io/app/launch-8kun_media.htm?tweet##1695105179593658703

Of course, it is possible that somebody might steal your cookies and mess around with the account. In which case, you can either make a new one or have an option for users to provide their own cookies.

dirkf commented 1 year ago

I'm back-porting the current yt-dlp Twitter/X extractor ATM. I suggest anyone who wants to contribute should hold back until this becomes a PR and then propose any improvements.

hundfred commented 1 year ago

i can confirm the 404 youtube-dl version 2021.12.17

ksylvan commented 11 months ago

I get the same 404 error. I can also confirm that the method outlined by @8chanAnon above works to get the metadata JSON file.