yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
85.85k stars 6.69k forks source link

[Broken] nebula extractor doesn't support cookies #496

Closed TpmKranz closed 3 years ago

TpmKranz commented 3 years ago

Checklist

Verbose log

[debug] Command-line config: ['--cookies', 'cookies.txt', '-v', '-N8', 'https://nebula.app/videos/lindsay-ellis-could-blazing-saddles-be-made-today']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] yt-dlp version 2021.07.07 
[debug] Python version 3.9.6 (CPython 64bit) - Linux-5.12.14.s0ix-8+-x86_64-with-glibc2.33
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[debug] [Nebula] Extracting URL: https://nebula.app/videos/lindsay-ellis-could-blazing-saddles-be-made-today
ERROR: This video is only available for registered users. Use --cookies, --username and --password or --netrc to provide account credentials
Traceback (most recent call last):
  File "/home/tpm/.local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1142, in wrapper
    return func(self, *args, **kwargs)
  File "/home/tpm/.local/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 1167, in __extract_info
    ie_result = ie.extract(url)
  File "/home/tpm/.local/lib/python3.9/site-packages/yt_dlp/extractor/common.py", line 569, in extract
    ie_result = self._real_extract(url)
  File "/home/tpm/.local/lib/python3.9/site-packages/yt_dlp/extractor/nebula.py", line 167, in _real_extract
    nebula_token = self._retrieve_nebula_auth(display_id)
  File "/home/tpm/.local/lib/python3.9/site-packages/yt_dlp/extractor/nebula.py", line 88, in _retrieve_nebula_auth
    self.raise_login_required()
  File "/home/tpm/.local/lib/python3.9/site-packages/yt_dlp/extractor/common.py", line 1043, in raise_login_required
    raise ExtractorError(msg, expected=True)
yt_dlp.utils.ExtractorError: This video is only available for registered users. Use --cookies, --username and --password or --netrc to provide account credentials

Description

When supplying a cookies.txt file, the extractor complains about having to be a registered user. Indeed, the cookie support was cut before merging, but I can't see why. The relevant commit just says Removed obsolete/irrelevant/misused authentication methods: environment variable, cookie jar and videopassword., even though I can't find any complaints about the cookie method and even a brief improvement suggestion. Does @hheimbuerger perhaps remember why cookie support was cut? I feel a little uncomfortable letting my credentials lie around in a file and cookies are at least somewhat ephemeral and can't be used to alter my account, afaict.

pukkandan commented 3 years ago

It was probably an oversight. Could you test whether adding back https://github.com/yt-dlp/yt-dlp/pull/122/commits/13154498ce0d94325e05346596a7d79b83bb6df5#diff-6fff743502081039f289ebbc5e851557d66676e18eeb1e20c6b62c7387604c7dL125-L133 makes authentication with cookies work correctly?

hheimbuerger commented 3 years ago

That's weird indeed.

I want to say that it couldn't possibly happen to me, that I would remove this by accident. Yet I cannot remember a specific conversation we had about removing it either. So I guess it did. 😉

My apologies! I'll create a PR for it tomorrow.

hheimbuerger commented 3 years ago

Here's my branch: https://github.com/hheimbuerger/yt-dlp/tree/issue496-nebula-add-cookie-auth-support Tested locally with my own credentials.

@TpmKranz Are you able to check out and test this branch? Note that it's only supporting the current nebula domain (nebula.app), so in case you're using a very old cookie jar file, you might need to re-login to Nebula once (or just edit the domain in your cookie jar).

@pukkandan What's the correct approach for the interaction between credential auth (--netrc or --password) and cookie jar authentication? The approach I'm following here is that if you're provided login credentials, potential cookies will be ignored and it will always attempt to authenticate with those credentials. (If they're wrong, even if the cookie token would be fine, authentication will fail.) Only if there's no credentials, it will use cookies. Is that a valid approach? Or should cookies take preference?

I'll open a PR once you approve my general approach. 😀

pukkandan commented 3 years ago

What's the correct approach for the interaction between credential auth (--netrc or --password) and cookie jar authentication?

I am not sure either. There aren't many extractors that support both login and cookies explicitly (for most cookie based logins, the presence of the cookie is sufficent for auth)

The approach I'm following here is that if you're provided login credentials, potential cookies will be ignored and it will always attempt to authenticate with those credentials. (If they're wrong, even if the cookie token would be fine, authentication will fail.) Only if there's no credentials, it will use cookies. Is that a valid approach?

I think that is good enough and is in-line with how other extractors behave

TpmKranz commented 3 years ago

@TpmKranz Are you able to check out and test this branch? Note that it's only supporting the current nebula domain (nebula.app), so in case you're using a very old cookie jar file, you might need to re-login to Nebula once (or just edit the domain in your cookie jar).

Works just as expected, thank you very much. 👍🏻

The approach I'm following here is that if you're provided login credentials, potential cookies will be ignored and it will always attempt to authenticate with those credentials. (If they're wrong, even if the cookie token would be fine, authentication will fail.) Only if there's no credentials, it will use cookies. Is that a valid approach? Or should cookies take preference?

If I could make a suggestion: Why not try cookie auth first and fall back to credential based auth on failure? In case of success, this would save an API call for an apiToken that's only used once, whereas currently, you request a new apiToken everytime and ignore the one given in the cookies file that wouldn't be there if one didn't want to use it multiple times in the first place. Once that's in place, you could go a step further and save the fresh apiToken to the given cookies file so that the apiToken could be reused and future API calls avoided. It's a bit more complicated but I could also not think of any other reason to provide both a cookie file and credentials at the same time. This way, I wouldn't have to mess around with my browser everytime the cookie expires and the cookie file could even be empty before executing yt-dlp and afterwards, it would contain a fresh apiToken to be used again and again until it expires!

Just my 2c; the way it currently works is functional as well, just not as comfortable. Thanks again!

pukkandan commented 3 years ago

This way, I wouldn't have to mess around with my browser everytime the cookie expires and the cookie file could even be empty before executing yt-dlp and afterwards, it would contain a fresh apiToken to be used again and again until it expires!

This already works like this

Once that's in place, you could go a step further and save the fresh apiToken to the given cookies file so that the apiToken could be reused and future API calls avoided.

This is how extractors should normally work, but not Nebula. I can't remember if @hheimbuerger had any reason to fetch the auth token in _real_extract instead of _real_initialize

TpmKranz commented 3 years ago

I'm sorry if I'm not following you correctly, but to me that seems like a contradiction:

This already works like this … This is how extractors should normally work, but not Nebula.

Calling yt-dl with a non-existent cookie jar and credentials, the cookie jar contains csrftoken and sessionid for api.watchnebula.com and uuid for player.zype.com afterwards. These are not enough to authenticate to nebula, apparently.

What's missing is a) the preference for cookie auth and b) something like:

self._set_cookie(
  'nebula.app',
  'nebula-auth',
  compat_urllib_parse_quote(
    f'{{"apiToken":"{response["key"]}","isLoggingIn":false,"isLoggingOut":false}}'
  )
)
hheimbuerger commented 3 years ago

Happy to hear the proof-of-concept works for you, @TpmKranz!

I'm following on your suggestion of changing the order of authentication methods. Give me a few days to review the details, because the login endpoint was a bit odd.

I'll create a PR then.

hheimbuerger commented 3 years ago

@pukkandan: I can't remember if @hheimbuerger had any reason to fetch the auth token in _real_extract instead of _real_initialize

Nah, I was just ignorant and had never heard of _real_initialize() before. 😉 I've switched to it now.

@TpmKranz: This way, I wouldn't have to mess around with my browser everytime the cookie expires and the cookie file could even be empty before executing yt-dlp and afterwards, it would contain a fresh apiToken to be used again and again until it expires!

I mean... yes, but that only applies if you're now willing to store your login credentials e.g. in .netrc. My understanding was that this is exactly what you don't want, and that's why we even started bringing back the cookie auth.

And if you're okay with storing that, then what are you gaining from this solution really? I would argue the only thing you gain is one less HTTP request (in most cases). Which, from a developer perspective sounds nice and clean and the right thing to do, but given that a video download anyway means about 5 HTTP requests for meta data retrieval and then another couple hundred to fetch all the DASH segments... meh!

Even though I'm kinda arguing against it, I implemented it nevertheless. It now tries to fetch the token from the cookie jar first, and only if no (non-expired) cookie is found, it runs the credential auth.

I've also implemented the obvious next step: checking for an invalid token response (HTTP 401) and then reauthenticating. This would be for the case that the server invalidates the token. I'm hesitant whether to leave this in. This just feels like too much: I'm not sure whether trying to be overly clever is a good idea for something like yt_dlp, where you're always working against undocumented APIs you're not really supposed to use, and where long-term stability might be more important than smart functionality.