upbit / pixivpy

Pixiv API for Python
https://pypi.org/project/PixivPy3/#files
The Unlicense
1.74k stars 148 forks source link

Cloudflare version 2 captcha on auth request #259

Open zzAIMoo opened 1 year ago

zzAIMoo commented 1 year ago

When trying to call any function from the package there's an error because pixiv changed their captcha to version 2 and that's not supported in the free version of cloudscraper

cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

During handling of the above exception, another exception occurred:

pixivpy3.utils.PixivError: requests POST https://oauth.secure.pixiv.net/auth/token error: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

edit: to workaround and make everything work use set_auth(access_token, refresh_token) instead of api.auth

SakiSakiSakiSakiSaki commented 1 year ago

When trying to call any function from the package there's an error because pixiv changed their captcha to version 2 and that's not supported in the free version of cloudscraper

cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version. During handling of the above exception, another exception occurred: pixivpy3.utils.PixivError: requests POST https://oauth.secure.pixiv.net/auth/token error: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

edit: to workaround and make everything work use set_auth(access_token, refresh_token) instead of api.auth

I tried doing what you mentioned, but then user.name wasn't printing.

Here's what I have:

import configparser
from pixivpy3 import *

api = AppPixivAPI()

config = configparser.ConfigParser(interpolation=None)

config.read('config.ini')

USERNAME = config['pixiv']['username']
PASSWORD = config['pixiv']['password']
REFRESH_TOKEN = config['pixiv']['refresh_token']

# Login with your Pixiv credentials (username and password)
api.auth(refresh_token=REFRESH_TOKEN)

# Get the user detail for the currently logged in user
user_detail = api.user_detail(api.user_id)
user = user_detail.user

# Print the user's username and user ID
print("Username: ", user.name)
print("User ID: ", user.id)

This works once in a blue moon, every other attempt returns the error: requests POST https://oauth.secure.pixiv.net/auth/token error: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.


Is the full command api.set_auth(access_token, refresh_token) ? Cause I tried api.set_auth(access_token = ACCESS_TOKEN, refresh_token = REFRESH_TOKEN) and print("Username: ", user.name) gave me this error:

Exception has occurred: AttributeError 'NoneType' object has no attribute 'name'

upbit commented 1 year ago

Pixiv changed captcha to version 2 is a bad news, may be we can use ZenRows instead?

Regarding the difference between set_auth() and auth(), in fact, set_auth() does not make a request to Pixiv. It only records the refresh_token for subsequent requests. You can refer to this code snippet for more details:

    def set_auth(self, access_token: str, refresh_token: str | None = None) -> None:
        self.access_token = access_token
        self.refresh_token = refresh_token

    def auth(
        self,
        username: str | None = None,
        password: str | None = None,
        refresh_token: str | None = None,
        headers: ParamDict = None,
    ) -> ParsedJson:
        """Login with password, or use the refresh_token to acquire a new bearer token"""

        r = self.requests_call("POST", url, headers=headers_, data=data)
SakiSakiSakiSakiSaki commented 1 year ago

Pixiv changed captcha to version 2 is a bad news, may be we can use ZenRows instead?

Regarding the difference between set_auth() and auth(), in fact, set_auth() does not make a request to Pixiv. It only records the refresh_token for subsequent requests. You can refer to this code snippet for more details:

How can I use it in conjunction with auth? Is it like this?

api.set_auth(ACCESS_TOKEN, REFRESH_TOKEN)
api.auth(refresh_token=REFRESH_TOKEN)
upbit commented 1 year ago

cloudscraper is configured here, this can be commented out and changed to requests.Session():

        self.requests = requests.Session()
        # self.requests = cloudscraper.create_scraper()  # fix due to #140

ZenRows access using a proxy (but I haven't tried), you can refer to demo.py to configure the proxy of requests:

# If a special network environment is meet, please configure requests as you need.
# Otherwise, just keep it empty.
zen_proxy = "http://APIKEY:@proxy.zenrows.com:8001"
_REQUESTS_KWARGS = {
    'proxies': {
        {"http": zen_proxy, "https": zen_proxy},
    },
    'verify': False,       # PAPI use https, an easy way is disable requests SSL verify
}

api = AppPixivAPI(**_REQUESTS_KWARGS)

I'm happy to get your feedback.

zzAIMoo commented 1 year ago

When trying to call any function from the package there's an error because pixiv changed their captcha to version 2 and that's not supported in the free version of cloudscraper

cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version. During handling of the above exception, another exception occurred: pixivpy3.utils.PixivError: requests POST https://oauth.secure.pixiv.net/auth/token error: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

edit: to workaround and make everything work use set_auth(access_token, refresh_token) instead of api.auth

I tried doing what you mentioned, but then user.name wasn't printing.

Here's what I have:

import configparser
from pixivpy3 import *

api = AppPixivAPI()

config = configparser.ConfigParser(interpolation=None)

config.read('config.ini')

USERNAME = config['pixiv']['username']
PASSWORD = config['pixiv']['password']
REFRESH_TOKEN = config['pixiv']['refresh_token']

# Login with your Pixiv credentials (username and password)
api.auth(refresh_token=REFRESH_TOKEN)

# Get the user detail for the currently logged in user
user_detail = api.user_detail(api.user_id)
user = user_detail.user

# Print the user's username and user ID
print("Username: ", user.name)
print("User ID: ", user.id)

This works once in a blue moon, every other attempt returns the error: requests POST https://oauth.secure.pixiv.net/auth/token error: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

Is the full command api.set_auth(access_token, refresh_token) ? Cause I tried api.set_auth(access_token = ACCESS_TOKEN, refresh_token = REFRESH_TOKEN) and print("Username: ", user.name) gave me this error:

Exception has occurred: AttributeError 'NoneType' object has no attribute 'name'

My workflow right now is something like this:

  1. Setup working auth, by working i mean i set it up so that when i execute the script with the refresh parameter it uses the old refresh_token i had (which you can get using the login parameter instead) and saves the access_token and refresh_token in a file
  2. Since i didn't want to waste much time and the project is a simple private discord bot i have, i made it so that when i run a command that uses the pixiv functions, before doing anything i refresh the access_token like this (so i have no problems of access_token expiring)
    Popen(["python3", "pixiv_auth.py", "refresh", refreshTokenFile[0].strip()])
  3. At this point then you can login via the api doing something like this:
    api.set_auth(access_token=accessToken, refresh_token=refreshToken)

    (the accessToken and refreshToken variables are the values i just saved in the file when running refresh)

    api.set_auth(access_token=accessToken, refresh_token=refreshToken)

Notice how i used api.set_auth instead of api.auth and passed both the access_token and the refresh_token, for some reason this works without doing api.auth after setting it, i don't know why tbh, i haven't looked into it that much, when i have time i'll check in more details

This works for me 100% of the time (or maybe i've been lucky, i don't know ahahahah), hope this helped in any way

SakiSakiSakiSakiSaki commented 1 year ago

Notice how i used api.set_auth instead of api.auth and passed both the access_token and the refresh_token, for some reason this works without doing api.auth after setting it

I vaguely understand what you have done. Are doing all of this (initial auth, refresh access_token to separate file, api.set_auth all in one file? Maybe my issue was that I forgot to renew my access_token, and to be fair it’s a pain to do it manually because of how short the window is to paste the code query parameter into the console (unless you have automated this too, would love to see that as an example, I know they had a Selenium method).

Would you mind sharing your code with all of these elements in together just so I can get a tangible reference? Barring your credentials of course.

Because when I tried doing what you did, with brand new a access_token, I get an error:

import configparser
from pixivpy3 import *

api = AppPixivAPI()

config = configparser.ConfigParser(interpolation=None)

config.read('config.ini')

USERNAME = config['pixiv']['username']
PASSWORD = config['pixiv']['password']
ACCESS_TOKEN = config['pixiv']['access_token']
REFRESH_TOKEN = config['pixiv']['refresh_token']

# Login with your Pixiv credentials (username and password)
api.set_auth(access_token=ACCESS_TOKEN, refresh_token=REFRESH_TOKEN)

# Get the user detail for the currently logged in user
user_detail = api.user_detail(api.user_id)
user = user_detail.user

# Print the user's username and user ID
print("Username:",user.name)
print("User ID:",user.id)
print(api.user_id)

returns 0.

zzAIMoo commented 1 year ago

Would you mind sharing your code with all of these elements in together just so I can get a tangible reference? Barring your credentials of course.

No problem :) (don't mind the awful code, i tried to patch the errors as fast as i could) File 1: pixiv_auth.py, This is the normal pixiv oauth workflow, i just added the 2 lines of code to save my access_token and refresh_token to 2 different files

def print_auth_token_response(response):
    data = response.json()

    try:
        access_token = data["access_token"]
        refresh_token = data["refresh_token"]
    except KeyError:
        print("error:")
        pprint(data)
        exit(1)

    accessTokenFile = open("access_token.txt", "w")
    refreshTokenFile = open("refresh_token.txt", "w")
    # Down below is what i added
    # ***************************************    
    accessTokenFile.write(access_token)
    refreshTokenFile.write(refresh_token)
    # ***************************************

File 2: pixiv_rand.py, In this file i use the illust_recommended function to get some sort of biased random image based on the ones i liked before but that's not the point. Before that function call is where i do the OAuth part (which i do in every file where i need to make calls to the pixiv api via the functions of pixivpy).

api = AppPixivAPI()
refreshTokenFile = open("refresh_token.txt").readlines()
Popen(["python3", "pixiv_auth.py", "refresh", refreshTokenFile[0].strip()])  # this runs the "File 1" with the parameter refresh so you need to pass your refresh_token
accessTokenFile = open("access_token.txt").readlines()
api.set_auth(access_token=accessTokenFile[0].strip(), refresh_token=refreshTokenFile[0].strip()) # we pass both the refresh token and the newly acquired access_token which was just saved in the access_token.txt file in the "File 1"
json_result = api.illust_recommended(content_type=arg_content_type or "illust")

Doing it this way i don't need to call the api.auth function and everything seems to work (or atleast gathering illustrations randomly or by passing tags and liking posts)

I tried just now calling the function you use to get user_details and it return 0 for the api.user_id and gives an error for the api.user_detail function to me too, so maybe doing it this way doesn't give you any way to get the user_id (?)

SakiSakiSakiSakiSaki commented 1 year ago

File 1: pixiv_auth.py, This is the normal pixiv oauth workflow, i just added the 2 lines of code to save my access_token and refresh_token to 2 different files

How do you ensure your access_token remains up to date? Do you just manually run python pixiv_auth.py login and go through all that hassle? While your .txt additions to that func makes it so you don't have to copy and paste the tokens anymore, you still have to do all the annoying stuff beforehand. Have you found a way to automate it? Because an access_token only lasts 3600 seconds.


I tried just now calling the function you use to get user_details and it return 0 for the api.user_id and gives an error for the api.user_detail function to me too, so maybe doing it this way doesn't give you any way to get the user_id (?)

I suppose I haven't done anything wrong then. But in that case, how is one supposed to access their own information using your workaround. Weird...

zzAIMoo commented 1 year ago

Have you found a way to automate it? Because an access_token only lasts 3600 seconds.

Right now i refresh it every time i need to call a function, so that i don't encounter any problems

(to refresh it i execute the script pixiv_auth using Popen

Popen(["python3", "pixiv_auth.py", "refresh", refresh_token])

and i made the thing to save the stuff to the txt files so i don't have to copy and paste everything

edit: I found out something really cool that might help you, if you do api.set_auth and just then do api.auth it seems to work and also enables the api.user_id and api.user_detail stuff. the workflow is the same as before, i just added this after the api.set_auth function call:

api.auth(refresh_token=refreshTokenFile[0].strip())
SakiSakiSakiSakiSaki commented 1 year ago

Right now i refresh it every time i need to call a function, so that i don't encounter any problems

(to refresh it i execute the script pixiv_auth using Popen

Popen(["python3", "pixiv_auth.py", "refresh", refresh_token])

What I have

import configparser, json
from subprocess import Popen
from pixivpy3 import *

api = AppPixivAPI()

config = configparser.ConfigParser(interpolation=None)

config.read('config.ini')

USERNAME = config['pixiv']['username']
PASSWORD = config['pixiv']['password']
ACCESS_TOKEN = config['pixiv']['access_token']
REFRESH_TOKEN = config['pixiv']['refresh_token']

print(Popen(["python3", "pixiv_auth.py", "refresh", REFRESH_TOKEN]))

print(Popen(["python3", "pixiv_auth.py", "refresh", REFRESH_TOKEN])) returns the error Exception has occurred: FileNotFoundError [WinError 2] The system cannot find the file specified

I even replaced the filename with the fullpath to the file. It's in the same directory as my script. Might have to do some scrappying changeDir workaround


edit: I found out something really cool that might help you, if you do api.set_auth and just then do api.auth it seems to work and also enables the api.user_id and api.user_detail stuff. the workflow is the same as before, i just added this after the api.set_auth function call:

api.auth(refresh_token=refreshTokenFile[0].strip())

I actually did ask of this earlier, I tried it and it didn't work. Was that what you mean here?

zzAIMoo commented 1 year ago

print(Popen(["python3", "pixiv_auth.py", "refresh", REFRESH_TOKEN])) returns the error Exception has occurred: FileNotFoundError [WinError 2] The system cannot find the file specified

Mh, it's strange it's not finding the file for you, could it be because i'm using linux and to run python i use the python3 command and maybe on windows it's different? like py or python? (haven't used python on windows in a while so i'm not sure) image This is the folder structure i have set up right now

I actually did ask of this https://github.com/upbit/pixivpy/issues/259#issuecomment-1457395328, I tried it and it didn't work. Was that what you mean here?

Ah yes, mb didn't see that comment, strange, i don't know why it works for me and doesn't for you tbh, could it be that it didn't work because you wrote the function like this api.set_auth(ACCESS_TOKEN, REFRESH_TOKEN) and not like this api.set_auth(access_token=ACCESS_TOKEN, refresh_token=REFRESH_TOKEN)? maybe it's some python magic i'm unaware of

SakiSakiSakiSakiSaki commented 1 year ago

Mh, it's strange it's not finding the file for you, could it be because i'm using linux and to run python i use the python3 command and maybe on windows it's different? like py or python? (haven't used python on windows in a while so i'm not sure)

That was the issue. "python" by itself worked. Returned <Popen: returncode: None args: ['python', 'pixiv_auth.py', 'refresh', 'ABCDE...>

I assume if I add similar edits to the pixiv_auth.py file, running that Popen command could potentially update my config file with new access_tokens and refresh_tokens.

config = configparser.ConfigParser()
    config.read('config.ini')
    config.set('pixiv', 'access_token', access_token)
    config.set('pixiv', 'refresh_token', refresh_token)
    with open('config.ini', 'w') as configfile:
        config.write(configfile)

Ah yes, mb didn't see that comment, strange, i don't know why it works for me and doesn't for you tbh, could it be that it didn't work because you wrote the function like this api.set_auth(ACCESS_TOKEN, REFRESH_TOKEN) and not like this api.set_auth(access_token=ACCESS_TOKEN, refresh_token=REFRESH_TOKEN)? maybe it's some python magic i'm unaware of

So I tried

api.set_auth(access_token=ACCESS_TOKEN, refresh_token=REFRESH_TOKEN)
api.auth(refresh_token=REFRESH_TOKEN)

and got the error

Exception has occurred: PixivError
requests POST https://oauth.secure.pixiv.net/auth/token error: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.
cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 Captcha challenge, This feature is not available in the opensource (free) version.

During handling of the above exception, another exception occurred:

from api.auth(refresh_token=REFRESH_TOKEN)

Xdynix commented 1 year ago

It's up to the server to decide whether a captcha will be required, which seems pretty random to us. Whether to use set_auth() doesn't help much on this. You need to try setting up a proxy to change your IP or something.

zzAIMoo commented 1 year ago

and got the error

at this point it's faster to setup a proxy or something to bypass cloudflare, i think FlareSolverr could work but i've never tried doing anything like that so i can't assure anything, when i have more time i'll check out some stuff

SakiSakiSakiSakiSaki commented 1 year ago

and got the error

at this point it's faster to setup a proxy or something to bypass cloudflare, i think FlareSolverr could work but i've never tried doing anything like that so i can't assure anything, when i have more time i'll check out some stuff

I'll wait around and see if a solution gets implemented into the API, since I still need to get a working program for whatever it is I need first. Thanks for staying up with me with this.

There's a lot of vagueness in the documentation, I think once I get my own ball rolling, I'd like to contribute to cleaning that up and add a full EN version, since this is the only working Python wrapper for Pixiv that I'm aware of.

zzAIMoo commented 1 year ago

Thanks for staying up with me with this.

No problem :)

ClosedPort22 commented 1 year ago

I think this is likely because Cloudflare's bot management system, which Pixiv uses, sosmehow detects cloudscraper (which shouldn't be used in the first place. Why would anyone request an API endpoint using a headless browser?)

If anyone has the official app installed, try capturing the traffic and find out the TLS fingerprint generated by the app. Cloudflare allows customers to whitelist JA3 fingerprints and I think it's possible that Pixiv has done so to make sure their official apps don't get broken.

I have no idea why the Pixiv staff think it's a good idea to put API endpoints behind Cloudflare's bot management system, since the CAPTCHA is intended for web browsers rather than apps.

nautics889 commented 1 year ago

Hello, @zzAIMoo! Have you tried to log in using proxy yet? I presume if it worked out we would be able to close the issue.

zzAIMoo commented 1 year ago

Hi sorry, i haven't tried yet because i've had almost no free time, i don't know when i'll be able to try and login using a proxy. As soon as i can i'll try and update here. (if anyone wants to do it before me so we can close the issue don't hesitate :D)

katresars commented 1 year ago

Same problem, when I change the proxy connection, the captcha problem disappears, maybe it's a proxy quality issue.