Search Limits - Githubissues

Ristellise commented 5 years ago

Wondered why my collections are only taking like 20k... So I decided to trial a bit

Public API Search limit is ~20K

App API Search limit: Offset 5000

In any case, how does pixiv web browser work since it can clearly bypass the limits imposed by the api?

To test APPAPI Limits:

srchapp = APPAPI.search_illust("SwordArtOnline OR SAO OR ソードアート・オンライン")
c = 0
while True:
    parsed = APPAPI.parse_qs(srchapp.next_url)
    srchapp = APPAPI.search_illust(word=parsed.get("word").replace("+"," "), offset=parsed.get("offset"))
    if len(srchapp.illusts) > 0:
        c+= len(srchapp.illusts)
    else:
        break
    print(f"Loaded {c}")

Where APPAPI = AppPixivAPI

Public API:

srch = PAPI.search_works("SwordArtOnline OR SAO OR ソードアート・オンライン",mode='tag',per_page=1000)

Result is:srch.pagination.pages X per_page != total

EDIT: Might be related due to non-premium accounts only available to access 1000 pages. Since Error Code for PAPI is: '1000ページまでしか取得できません。'

Xdynix commented 5 years ago

You are right. There are some limits on both PAPI and AAPI. I think this should be because both APIs are for mobile apps, and the web version doesn't use these APIs.

Ristellise commented 5 years ago

Decided to do an experiment... seems like I can scrape Pixiv directly [Via their website] and reach over the normal public limit of 20K illusts.

ThePreviousOne commented 5 years ago

@Ristellise do you still need this API to authenticate, Im curious about your implementation as Im facing the same problem, thought in my case I need to login, in fact I just learn't python a few days ago in order to use this API.

Ristellise commented 5 years ago

EDIT: As of 1/11/2019, it doesnt work anymore. You still can hack your way through using a pythonic web browser though.

You still need to authenticate/signin to unlock all the images (As in, a regular user who searched on Pixiv will not be able to see some content).

But at this point it doesn't need PixivPy to actually search for all the images.
Below is the python script for authentication by a user signing in. stripped down to it's bare essentials.

Does not support recaptcha response. so if your forced by a recaptcha... sorry.

class loginManager:
def __init__(self, **kwargs):
    self.username = kwargs.get("username")
    self.password = kwargs.get("passw")
    self.logintoken = None
    self.session = None

def doLogin(self):
    loginsession = requests.Session()
    login = loginsession.get("https://accounts.pixiv.net/login")
    loginhtml = BeautifulSoup(login.text,"html5lib")
    data = {'pixiv_id': self.username, 'password': self.password, 'captcha': '', 'g_recaptcha_response': '',
            'return_to': 'https://www.pixiv.net', 'lang': 'en', 'post_key': loginhtml.input['value'],
            'source': "accounts", 'ref': ''}
    url = "https://accounts.pixiv.net/api/login?lang=en"
    response = loginsession.post(url, data=data)
    print(response.text)
    respj = response.json()
    if not respj['error']:
        if respj['body'] == {'success': {'return_to': 'https://www.pixiv.net'}}:
            self.logintoken = response.cookies.get('PHPSESSID')
    self.session = loginsession

def getSession(self):
    if self.session is None:
        self.doLogin()
    return self.session

def getLogin(self):
    if self.logintoken is None:
        self.doLogin()
    return self.logintoken

ThePreviousOne commented 5 years ago

Thanks

Ristellise commented 5 years ago

Latest Pixiv website updates breaks the above code, now it requires recaptcha for all.

upbit / pixivpy

Search Limits #69