mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.91k stars 976 forks source link

Bluesky authentication error: InvalidToken: Unexpected authorization type #6134

Open GiovanH opened 2 months ago

GiovanH commented 2 months ago

I'm experiencing authentication issues in bluesky very similar to https://github.com/mikf/gallery-dl/issues/5780, except that clearing the cache doesn't resolve the problem.

$ py -3.11 -m gallery_dl 'https://bsky.app/profile/im.giovanh.com/post/3l2srgwskgt2n' -c gallery.conf --netrc --verbose
[gallery-dl][debug] Version 1.27.3 - Git HEAD: c5147527
[gallery-dl][debug] Python 3.11.5 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 1.26.18
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json', 'gallery.conf'][gallery-dl][debug] Starting DownloadJob for 'https://bsky.app/profile/im.giovanh.com/post/3l37hsm3uud2j'
[bluesky][debug] Using BlueskyPostExtractor for 'https://bsky.app/profile/im.giovanh.com/post/3l37hsm3uud2j'
[bluesky][info] Refreshing access token for im.giovanh.com
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
[urllib3.connectionpool][debug] https://bsky.social:443 "POST /xrpc/com.atproto.server.refreshSession HTTP/1.1" 401 59
[bluesky][debug] Server response: {'error': 'AuthMissing', 'message': 'Authentication Required'}[bluesky][error] AuthenticationError: "AuthMissing: Authentication Required"

$ py -3.11 -m gallery_dl 'https://bsky.app/profile/im.giovanh.com/post/3l2srgwskgt2n' -c gallery.conf --netrc --verbose --clear-cache bluesky
[gallery-dl][debug] Version 1.27.3 - Git HEAD: c5147527
[gallery-dl][debug] Python 3.11.5 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 1.26.18
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json', 'gallery.conf'][cache][info] Deleted 2 entries from 'C:\Users\Seth\AppData\Roaming\gallery-dl\cache.sqlite3'

$ py -3.11 -m gallery_dl 'https://bsky.app/profile/im.giovanh.com/post/3l2srgwskgt2n' -c gallery.conf --netrc --verbose
[gallery-dl][debug] Version 1.27.4-dev - Git HEAD: 35957216
[gallery-dl][debug] Python 3.11.5 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 1.26.18
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json', 'gallery.conf'][gallery-dl][debug] Starting DownloadJob for 'https://bsky.app/profile/im.giovanh.com/post/3l37hsm3uud2j'
[bluesky][debug] Using BlueskyPostExtractor for 'https://bsky.app/profile/im.giovanh.com/post/3l37hsm3uud2j'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/com.atproto.identity.resolveHandle?handle=im.giovanh.com HTTP/1.1" 200 42
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Awfmgbxfwfqbm7qbokusgg7gr%2Fapp.bsky.feed.post%2F3l37hsm3uud2j&depth=0&parentHeight=0 HTTP/1.1" 400 66
[bluesky][debug] Server response: {"error":"InvalidToken","message":"Unexpected authorization type"}[bluesky][error] API request failed ('InvalidToken: Unexpected authorization type')

Authentication is defined in netrc.

GiovanH commented 3 days ago

@mikf Still seeing this. Looks like:

[cache][info] Deleted 2 entries from 'C:\Users\<USER>\AppData\Roaming\gallery-dl\cache.sqlite3'

> py -3.11 -m gallery_dl https://bsky.app/profile/im.giovanh.com/post/3lahzthcmff2p --verbose --print-traffic --netrc
[gallery-dl][debug] Version 1.28.0-dev - Git HEAD: 061b27f3
[gallery-dl][debug] Python 3.11.5 - Windows-10-10.0.19045-SP0
[gallery-dl][debug] requests 2.32.3 - urllib3 1.26.18
[gallery-dl][debug] Configuration Files ['%APPDATA%\\gallery-dl\\config.json']
[gallery-dl][debug] Starting DownloadJob for 'https://bsky.app/profile/im.giovanh.com/post/3lahzthcmff2p'
[bluesky][debug] Using BlueskyPostExtractor for 'https://bsky.app/profile/im.giovanh.com/post/3lahzthcmff2p'
[bluesky][info] Logging in as im.giovanh.com
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: */*\r\nAccept-Langua
ge: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.app/\r\nContent-Type: application/json\r\nContent-Length: 64\r\nAuthorization: Basic <TOKEN>\r\n\r\n'
send: b'{"identifier":"im.giovanh.com","password":"<APP PASSWORD>"}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 12 Nov 2024 02:31:27 GMT
header: Content-Type: application/json; charset=utf-8
header: Transfer-Encoding: chunked
header: Connection: keep-alive
header: X-Powered-By: Express
header: Access-Control-Allow-Origin: *
header: RateLimit-Limit: 30
header: RateLimit-Remaining: 28
header: RateLimit-Reset: 1731378963
header: RateLimit-Policy: 30;w=300
header: ETag: <NOT SURE IF SENSITIVE>    
header: Vary: Accept-Encoding
header: Content-Encoding: gzip
[urllib3.connectionpool][debug] https://bsky.social:443 "POST /xrpc/com.atproto.server.createSession HTTP/1.1" 200 None
send: b'GET /xrpc/com.atproto.identity.resolveHandle?handle=im.giovanh.com HTTP/1.1\r\n
  Host: bsky.social\r\n
  User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\n
  Accept: application/json\r\n
  Accept-Language: en-US,en;q=0.5\r\n
  Accept-Encoding: gzip, deflate, br\r\n
  Referer: https://bsky.app/\r\n
  Authorization: Basic <TOKEN>\r\n\r\n'     
reply: 'HTTP/1.1 200 OK\r\n'
header: Date: Tue, 12 Nov 2024 02:31:27 GMT
header: Content-Type: application/json; charset=utf-8
header: Content-Length: 42
header: Connection: keep-alive
header: X-Powered-By: Express
header: Access-Control-Allow-Origin: *
header: RateLimit-Limit: 3000
header: RateLimit-Remaining: 2997
header: RateLimit-Reset: 1731378963
header: RateLimit-Policy: 3000;w=300
header: ETag: <NOT SURE IF SENSITIVE>
header: Vary: Accept-Encoding
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/com.atproto.identity.resolveHandle?handle=im.giovanh.com HTTP/1.1" 200 42
send: b'GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0 HTTP/1.1\r\n
  Host: bsky.social\r\n
  User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\n
  Accept: application/json\r\n
  Accept-Language: en-US,en;q=0.5\r\n
  Accept-Encoding: gzip, deflate, br\r\n
  Referer: https://bsky.app/\r\n
  Authorization: Basic <TOKEN>\r\n\r\n'
reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Date: Tue, 12 Nov 2024 02:31:27 GMT
header: Content-Type: application/json; charset=utf-8
header: Content-Length: 66
header: Connection: keep-alive
header: X-Powered-By: Express
header: Access-Control-Allow-Origin: *
header: RateLimit-Limit: 3000
header: RateLimit-Remaining: 2996
header: RateLimit-Reset: 1731378963
header: RateLimit-Policy: 3000;w=300
header: ETag: <NOT SURE IF SENSITIVE>
header: Vary: Accept-Encoding
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0 HT
TP/1.1" 400 66
[bluesky][debug] Server response: {"error":"InvalidToken","message":"Unexpected authorization type"}
[bluesky][error] API request failed ('InvalidToken: Unexpected authorization type')

It is clearly sending a basic auth token, and it's the same one that worked in the resolveHandle call. But it's also the token from createSession. Is it the wrong one?

I can also confirm this isn't a netrc issue, as passing an explicit --username and --password gives the same result.

GiovanH commented 3 days ago

A little debugging:

diff --git a/gallery_dl/extractor/bluesky.py b/gallery_dl/extractor/bluesky.py
index de5d0c6f..7147df0a 100644
--- a/gallery_dl/extractor/bluesky.py
+++ b/gallery_dl/extractor/bluesky.py
@@ -446,6 +446,7 @@ class BlueskyAPI():

     def authenticate(self):
         self.headers["Authorization"] = self._authenticate_impl(self.username)
+        self.log.info("Implicit authorization value is %s", self.headers["Authorization"])

     @cache(maxage=3600, keyarg=1)
     def _authenticate_impl(self, username):
@@ -483,9 +484,12 @@ class BlueskyAPI():

         while True:
             self.authenticate()
+            self.log.info("Calling request with authorization %s", self.headers["Authorization"])
             response = self.extractor.request(
                 url, params=params, headers=self.headers, fatal=None)

+            self.log.info("The actual headers sent were %s", response.request.headers)
+
             if response.status_code < 400:
                 return response.json()
             if response.status_code == 429:

With this patch applied, the problem is apparent:

[bluesky][info] Implicit authorization value is Bearer <long token>
[bluesky][info] Calling request with authorization Bearer <long token>
send: b'GET /xrpc/app.bsky.feed.getPostThread?uri=at%3A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp
.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0 HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Moz
illa/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: application/js
on\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.ap
p/\r\nAuthorization: Basic <token>\r\n\r\n'
reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Date: Tue, 12 Nov 2024 02:57:05 GMT
header: Content-Type: application/json; charset=utf-8
...
header: Vary: Accept-Encoding
[urllib3.connectionpool][debug] https://bsky.social:443 "GET /xrpc/app.bsky.feed.getPostThread?uri=at%3
A%2F%2Fdid%3Aplc%3Akjx6y3groxh3sy5tkfyji6sy%2Fapp.bsky.feed.post%2F3lahzthcmff2p&depth=0&parentHeight=0
 HTTP/1.1" 400 66
[bluesky][info] The actual headers sent were {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; 
rv:128.0) Gecko/20100101 Firefox/128.0', 'Accept': 'application/json', 'Accept-Language': 'en-US,en;q=0
.5', 'Accept-Encoding': 'gzip, deflate, br', 'Referer': 'https://bsky.app/', 'Authorization': 'Basic <token>'}
[bluesky][debug] Server response: {"error":"InvalidToken","message":"Unexpected authorization type"}   
[bluesky][error] API request failed ('InvalidToken: Unexpected authorization type')

See that the "authenticated" call is using the basic authorization token, not the explicitly passed bearer auth header! Something is wrong with Extractor's request implementation.

GiovanH commented 3 days ago

OK. I spent my whole day digging into this, but I've found a way to fix this. https://github.com/mikf/gallery-dl/pull/6455

mikf commented 3 days ago

Your version of gallery-dl is sending an Authorization: Basic <TOKEN> header during the login process.

[bluesky][info] Logging in as im.giovanh.com
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: */*\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.app/\r\nContent-Type: application/json\r\nContent-Length: 64\r\nAuthorization: Basic <TOKEN>\r\n\r\n'
send: b'{"identifier":"im.giovanh.com","password":"<APP PASSWORD>"}'

This does not happen with vanilla settings or code. The only way this and all the other Authorization: Basic <TOKEN> headers would be possible is if the session's auth parameter would be set, which gallery-dl only does for danbooru and pixeldrain. I strongly suspect you've made some changes to your gallery-dl code that somewhere sets self.session.auth to a username-password tuple, which gets transformed by requests to a Basic authorization header for every request.

GiovanH commented 3 days ago

Your version of gallery-dl is sending an Authorization: Basic <TOKEN> header during the login process.

I agree that this is what's happening, but according to every metric I know, gallery-dl is doing this. I am invoking the local module with py -3.11 -m gallery_dl, and can see my exact worktree in https://github.com/mikf/gallery-dl/pull/6455. The only thing I have in my global config.json file is

{
  "extractor": {
    "twitter": {
      "cookies": "<path>",
      "logout": true
    }
  }
}

gallery-dl starts passing a basic token to requests as early as

[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession

I can try to identify exactly why this is happening, but everything I see tells me this is vanilla behavior.

mikf commented 3 days ago

There is no Authorization header during login with default settings.

$ gallery-dl --print-traffic -u foo -p bar -v --config-ignore https://bsky.app/profile/bsky.app
[gallery-dl][debug] Version 1.28.0-dev - Git HEAD: cd6d6ea8
[gallery-dl][debug] Python 3.12.7 - Linux-6.11.6-arch1-1-x86_64-with-glibc2.40
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files []
[gallery-dl][debug] Starting DownloadJob for 'https://bsky.app/profile/bsky.app'
[bluesky][debug] Using BlueskyUserExtractor for 'https://bsky.app/profile/bsky.app'
[bluesky][debug] Using BlueskyMediaExtractor for 'https://bsky.app/profile/bsky.app/media'
[bluesky][info] Logging in as foo
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): bsky.social:443
send: b'POST /xrpc/com.atproto.server.createSession HTTP/1.1\r\nHost: bsky.social\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0\r\nAccept: */*\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate, br\r\nReferer: https://bsky.app/\r\nContent-Type: application/json\r\nContent-Length: 37\r\n\r\n'
send: b'{"identifier":"foo","password":"bar"}'
GiovanH commented 3 days ago

Requests is doing it because there's a netrc value set for bsky.social.

Tracing the initial login request:

https://github.com/mikf/gallery-dl/blob/master/gallery_dl/extractor/common.py#L172 kwargs['headers'] is {'Content-Type': 'application/json'}

https://github.com/psf/requests/blob/main/src/requests/sessions.py#L575 req.headers is {'Content-Type': 'application/json'} auth is None

https://github.com/psf/requests/blob/main/src/requests/sessions.py#L478-L481 This is called; requests sets auth from netrc even though gallery-dl did not explicitly request it

https://github.com/psf/requests/blob/main/src/requests/sessions.py#L498 self.auth is None auth is a username/password tuple request.headers is {'Content-Type': 'application/json'} self.headers is

{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0', 'Accept': '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate, br', 'Referer': 'https://bsky.app/'}

p.headers is

{'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0', 'Accept': '*/*', 'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate, br', 'Referer': 'https://bsky.app/', 'Content-Type': 'application/json', 'Content-Length': '64', 'Authorization': 'Basic TOKEN'}

Temporarily removing my bsky.social netrc entry works around the problem, even on origin/master. I hate this, but I still think gallery-dl should handle this case somehow.

mikf commented 3 days ago

Wow, now that's something I would have never expected. Good find.

I'll probably add an option to disable requests' trust_env and/or use a "noop" session auth so requests doesn't try to interfere.

mikf commented 6 hours ago

Should be fixed with https://github.com/mikf/gallery-dl/commit/0a72a5009c8d7bc46ac7f82a38e0c885107f8d26 by preventing requests from gathering .netrc auth and overwriting Authorization headers with it.