Open GiovanH opened 2 months ago
@mikf Still seeing this. Looks like:
[cache][info] Deleted 2 entries from 'C:\Users\<USER>\AppData\Roaming\gallery-dl\cache.sqlite3'
> py -3.11 -m gallery_dl https://bsky.app/profile/im.giovanh.com/post/3lahzthcmff2p --verbose --print-traffic --netrc
[gallery-dl][debug] Version 1.28.0-dev - Git HEAD: 061b27f3
[gallery-dl][debug] Python 3.11.5 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 1.26.18
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json']
[gallery-dl][debug] Starting DownloadJob for 'https://bsky.app/profile/im.giovanh.com/post/3lahzthcmff2p'
[bluesky][debug] Using BlueskyPostExtractor for 'https://bsky.app/profile/im.giovanh.com/post/3lahzthcmff2p'
[bluesky][info] Logging in as im.giovanh.com
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: */*\r\nAccept-Langua
ge: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.app/\r\nContent-Type: application/json\r\nContent-Length: 64\r\nAuthorization: Basic <TOKEN>\r\n\r\n'
send: b'{"identifier":"im.giovanh.com","password":"<APP PASSWORD>"}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 12 Nov 2024 02:31:27 GMT
header: Content-Type: application/json; charset=utf-8
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: X-Powered-By: Express
header: Access-Control-Allow-Origin: *
header: RateLimit-Limit: 30
header: RateLimit-Remaining: 28
header: RateLimit-Reset: 1731378963
header: RateLimit-Policy: 30;w=300
header: ETag: <NOT SURE IF SENSITIVE>
header: Vary: Accept-Encoding
header: Content-Encoding: gzip
[urllib3.connectionpool][debug] https://bsky.social:443 "POST /xrpc/com.atproto.server.createSession HTTP/1.1" 200 None
send: b'GET /xrpc/com.atproto.identity.resolveHandle?handle=im.giovanh.com HTTP/1.1\r\n
Host: bsky.social\r\n
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\n
Accept: application/json\r\n
Accept-Language: en-US,en;q=0.5\r\n
Accept-Encoding: gzip, deflate, br\r\n
Referer: https://bsky.app/\r\n
Authorization: Basic <TOKEN>\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 12 Nov 2024 02:31:27 GMT
header: Content-Type: application/json; charset=utf-8
header: Content-Length: 42
header: Connection: keep-alive
header: X-Powered-By: Express
header: Access-Control-Allow-Origin: *
header: RateLimit-Limit: 3000
header: RateLimit-Remaining: 2997
header: RateLimit-Reset: 1731378963
header: RateLimit-Policy: 3000;w=300
header: ETag: <NOT SURE IF SENSITIVE>
header: Vary: Accept-Encoding
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/com.atproto.identity.resolveHandle?handle=im.giovanh.com HTTP/1.1" 200 42
send: b'GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0 HTTP/1.1\r\n
Host: bsky.social\r\n
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\n
Accept: application/json\r\n
Accept-Language: en-US,en;q=0.5\r\n
Accept-Encoding: gzip, deflate, br\r\n
Referer: https://bsky.app/\r\n
Authorization: Basic <TOKEN>\r\n\r\n'
reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Date: Tue, 12 Nov 2024 02:31:27 GMT
header: Content-Type: application/json; charset=utf-8
header: Content-Length: 66
header: Connection: keep-alive
header: X-Powered-By: Express
header: Access-Control-Allow-Origin: *
header: RateLimit-Limit: 3000
header: RateLimit-Remaining: 2996
header: RateLimit-Reset: 1731378963
header: RateLimit-Policy: 3000;w=300
header: ETag: <NOT SURE IF SENSITIVE>
header: Vary: Accept-Encoding
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0 HT
TP/1.1" 400 66
[bluesky][debug] Server response: {"error":"InvalidToken","message":"Unexpected authorization type"}
[bluesky][error] API request failed ('InvalidToken: Unexpected authorization type')
It is clearly sending a basic auth token, and it's the same one that worked in the resolveHandle call. But it's also the token from createSession. Is it the wrong one?
I can also confirm this isn't a netrc issue, as passing an explicit --username and --password gives the same result.
A little debugging:
diff --git a/gallery_dl/extractor/bluesky.py b/gallery_dl/extractor/bluesky.py
index de5d0c6f..7147df0a 100644
--- a/gallery_dl/extractor/bluesky.py
+++ b/gallery_dl/extractor/bluesky.py
@@ -446,6 +446,7 @@ class BlueskyAPI():
def authenticate(self):
self.headers["Authorization"] = self._authenticate_impl(self.username)
+ self.log.info("Implicit authorization value is %s", self.headers["Authorization"])
@cache(maxage=3600, keyarg=1)
def _authenticate_impl(self, username):
@@ -483,9 +484,12 @@ class BlueskyAPI():
while True:
self.authenticate()
+ self.log.info("Calling request with authorization %s", self.headers["Authorization"])
response = self.extractor.request(
url, params=params, headers=self.headers, fatal=None)
+ self.log.info("The actual headers sent were %s", response.request.headers)
+
if response.status_code < 400:
return response.json()
if response.status_code == 429:
With this patch applied, the problem is apparent:
[bluesky][info] Implicit authorization value is Bearer <long token>
[bluesky][info] Calling request with authorization Bearer <long token>
send: b'GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp
.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0 HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Moz
illa/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: application/js
on\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.ap
p/\r\nAuthorization: Basic <token>\r\n\r\n'
reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Date: Tue, 12 Nov 2024 02:57:05 GMT
header: Content-Type: application/json; charset=utf-8
...
header: Vary: Accept-Encoding
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/app.bsky.feed.getPostThread?uri=at%3
A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0
HTTP/1.1" 400 66
[bluesky][info] The actual headers sent were {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64;
rv:128.0) Gecko/20100101 Firefox/128.0', 'Accept': 'application/json', 'Accept-Language': 'en-US,en;q=0
.5', 'Accept-Encoding': 'gzip, deflate, br', 'Referer': 'https://bsky.app/', 'Authorization': 'Basic <token>'}
[bluesky][debug] Server response: {"error":"InvalidToken","message":"Unexpected authorization type"}
[bluesky][error] API request failed ('InvalidToken: Unexpected authorization type')
See that the "authenticated" call is using the basic authorization token, not the explicitly passed bearer auth header! Something is wrong with Extractor
's request implementation.
OK. I spent my whole day digging into this, but I've found a way to fix this. https://github.com/mikf/gallery-dl/pull/6455
Your version of gallery-dl is sending an Authorization: Basic <TOKEN>
header during the login process.
[bluesky][info] Logging in as im.giovanh.com [urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443 send: b'POST /xrpc/com.atproto.server.createSession HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: */*\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.app/\r\nContent-Type: application/json\r\nContent-Length: 64\r\nAuthorization: Basic <TOKEN>\r\n\r\n' send: b'{"identifier":"im.giovanh.com","password":"<APP PASSWORD>"}'
This does not happen with vanilla settings or code. The only way this and all the other Authorization: Basic <TOKEN>
headers would be possible is if the session's auth
parameter would be set, which gallery-dl only does for danbooru
and pixeldrain
. I strongly suspect you've made some changes to your gallery-dl code that somewhere sets self.session.auth
to a username-password tuple, which gets transformed by requests
to a Basic
authorization header for every request.
Your version of gallery-dl is sending an
Authorization: Basic <TOKEN>
header during the login process.
I agree that this is what's happening, but according to every metric I know, gallery-dl is doing this. I am invoking the local module with py -3.11 -m gallery_dl
, and can see my exact worktree in https://github.com/mikf/gallery-dl/pull/6455. The only thing I have in my global config.json file is
{
"extractor": {
"twitter": {
"cookies": "<path>",
"logout": true
}
}
}
gallery-dl starts passing a basic token to requests as early as
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession
I can try to identify exactly why this is happening, but everything I see tells me this is vanilla behavior.
There is no Authorization
header during login with default settings.
$ gallery-dl --print-traffic -u foo -p bar -v --config-ignore https://bsky.app/profile/bsky.app
[gallery-dl][debug] Version 1.28.0-dev - Git HEAD: cd6d6ea8
[gallery-dl][debug] Python 3.12.7 - Linux-6.11.6-arch1-1-x86_64-with-glibc2.40
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files []
[gallery-dl][debug] Starting DownloadJob for 'https://bsky.app/profile/bsky.app'
[bluesky][debug] Using BlueskyUserExtractor for 'https://bsky.app/profile/bsky.app'
[bluesky][debug] Using BlueskyMediaExtractor for 'https://bsky.app/profile/bsky.app/media'
[bluesky][info] Logging in as foo
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: */*\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.app/\r\nContent-Type: application/json\r\nContent-Length: 37\r\n\r\n'
send: b'{"identifier":"foo","password":"bar"}'
Requests is doing it because there's a netrc value set for bsky.social.
Tracing the initial login request:
https://github.com/mikf/gallery-dl/blob/master/gallery_dl/extractor/common.py#L172
kwargs['headers']
is {'Content-Type': 'application/json'}
https://github.com/psf/requests/blob/main/src/requests/sessions.py#L575
req.headers
is {'Content-Type': 'application/json'}
auth
is None
https://github.com/psf/requests/blob/main/src/requests/sessions.py#L478-L481
This is called; requests sets auth
from netrc even though gallery-dl did not explicitly request it
https://github.com/psf/requests/blob/main/src/requests/sessions.py#L498
self.auth
is None
auth
is a username/password tuple
request.headers
is {'Content-Type': 'application/json'}
self.headers
is
{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0', 'Accept': '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate, br', 'Referer': 'https://bsky.app/'}
p.headers
is
{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0', 'Accept': '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate, br', 'Referer': 'https://bsky.app/', 'Content-Type': 'application/json', 'Content-Length': '64', 'Authorization': 'Basic TOKEN'}
Temporarily removing my bsky.social netrc entry works around the problem, even on origin/master. I hate this, but I still think gallery-dl should handle this case somehow.
Wow, now that's something I would have never expected. Good find.
I'll probably add an option to disable requests' trust_env
and/or use a "noop" session auth so requests doesn't try to interfere.
Should be fixed with https://github.com/mikf/gallery-dl/commit/0a72a5009c8d7bc46ac7f82a38e0c885107f8d26 by preventing requests
from gathering .netrc auth and overwriting Authorization
headers with it.
I'm experiencing authentication issues in bluesky very similar to https://github.com/mikf/gallery-dl/issues/5780, except that clearing the cache doesn't resolve the problem.
Authentication is defined in netrc.