upbit / pixivpy

Pixiv API for Python
https://pypi.org/project/PixivPy3/#files
The Unlicense
1.79k stars 149 forks source link

Pixiv rate limiting pixivpy #199

Closed biggestsonicfan closed 2 years ago

biggestsonicfan commented 2 years ago

Greetings,

I am attempting to clean up some database issues caused by PixivUtil2. To do that, I am using pixivpy to parse local filenames of images I have with no associated database entries and am turning those names into Illustration IDs to look up the user of each Illustration ID.

My problem is, pixivpy seems to just return None after so many successful attempts. I see there is a workaround for this in #194, however when returning api.illust_detail(image_id) I need to know if the user['id'] is actually None to verify and log that those images have been purged from Pixiv's live hosting.

My "work around" currently is to detect if I get 5 None returns in a row and exit. I am giving sufficient sleep() times inbetween queries, giving 90 + randint(7,36) seconds per query and 300 + randint(12,98) if my loop count ends in 0.

I bang on Pixiv nearly 24/7 with PixivUtil2 for archiving and don't ever run into rate limiting like this.

Is there an alternative method I can use to check if an illustration Id is live before querying it's associated user id?

Xdynix commented 2 years ago

Can you please provide some illustration IDs and responses about the None? I don't quite catch it.

I haven't seen a situation where the user['id'] is None. For a deleted illustration (e.g. 67368890), the user ID in api.illust_detail() is 0.

biggestsonicfan commented 2 years ago

Ah so I misremembered my code a bit and just wrote down the issue from memory without looking at my code. As it turns out it's not the result of the user['id'] that is None, it's the api.illust_detail(image_id).illust itself that is None and you get an error trying to retrieve user['id'] from None.

The following illustration ids should produce these results: 75565112,78643068,83539790,83747590,82884224,81888528,82208804,82229901,81373109,79516594,63741298,76003609,65565435,64341400,68020512,70480933,78496706,78496740,74215489,74273791,76365036,75852073,75909010,73702476,73648727,70190855,73279197,71799957,73285458,63581551,71154801,69746725,62204112,23061092,69444612,69032613,60407169,49441927,67331367,66982486,67442959,46869780,67398155,65248932,66070726,66247388,66247395,65249012,64384346,64458025,59156948,61837119,55628008,63581882,45754292,61693332

Also, it looks like the the api = AppPixivAPI() runs out of steam when it's called outside of a loop. I've moved the code which creates the api instance and assigns refresh_token as a fresh api instance each loop (each check of an image) and I am seemingly no longer running into the rate limit?

Xdynix commented 2 years ago

Check the response and you can see that there is an error field in it.

json_result = api.illust_detail('60407169')
pprint(json_result)
# Output
{'error': {'message': '',
           'reason': '',
           'user_message': 'Work has been deleted or the ID does not exist.',
           'user_message_details': {}}}

So you may need to check whether error exists, or use more robust code like

user_id = json_result.get('illust', {}).get('user', {}).get('id', 0)

The AppPixivAPI has nothing to do with stream except downloading. What are you referring to? I also remember that the expiration time of the access token obtained by api.auth(refresh_token=TOKEN) is one hour, so if you need to run it for a long time, you need to call auth() every once in a while to get a new access token.

biggestsonicfan commented 2 years ago

I said "steam", not "stream", it was more of a metaphor for the application just giving up. But it seemed like if the token was only valid for an hour, then putting auth() in my loop closed it.

However, as you can see now, the user id for illust_detail() is not returning 0 and returns as None if it does not exist.

I feel the 1 hour time limit should be better documented, as I had not seen that anywhere when running into this issue.

Xdynix commented 2 years ago

Okay, I read the wrong word and didn't get the metaphor.

This is an unofficial Pixiv API, not even supported by Pixiv. All these behaviors are undocumented and subject to change from the server side, you need to probe them yourself.

The expiration time and expiration hint of the access token can be seen in the auth() response and error response.