mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
12.05k stars 981 forks source link

Pixiv downloading "Work cannot be displayed" image #4327

Open 501stRookie opened 1 year ago

501stRookie commented 1 year ago

Starting today, when I tried to download an image from Pixiv instead of downloading the image, it instead downloads this image with Japanese text that says "This work cannot be displayed".

It seems to only happen on posts that were recently posted, as images that were uploaded yesterday and older download fine. 110073090_0

thatfuckingbird commented 1 year ago

Can confirm, seeing this too. Refreshing the login token doesn't help and it doesn't seem to be related to whether the image is r18 or not. The URL I get for this image is https://s.pximg.net/common/images/limit_sanity_level_360.png The metadata json seems to be mostly fine, but it also contains a "sanity_level": 4, entry.

Update: Tried PixivUtil2, it doesn't have this problem.

Update2: Tried the official Pixiv app, these images also do not show up there. The last displayed images (if I short by "newest") are from yesterday. Except ugoira, those show for some reason.

Also might be related issue here: https://github.com/upbit/pixivpy/issues/275

Update3:

If you check https://www.pixiv.net/info.php?cid=1&lang=en there are announcements about the suspension and reinstantiation of their mobile apps from app stores, related to content in the apps. I think they might be doing some kind of semi-manual filtering now which causes this lag between the mobile app API and the website. This might mean we can no longer use that API, at least for downloading the images themselves. Also for the future, it might be a good idea to detect that limit_sanity_level placeholder image and error on it.

Update4:

The metadata is incomplete too for these images (no tags).

Slider-Whistle commented 1 year ago

Doesn't appear to be happening on my end.

kattjevfel commented 1 year ago

I ran into this but I can no longer reproduce it, so must've been something temporary.

thatfuckingbird commented 1 year ago

It is still happening (as I'm writing this), but the lag between what is available on the mobile app API and what's visible on the site has decreased. Currently I see about a 8-10 minute lag until an image shows up on the mobile app (looking at the posting times). You can reproduce this if you go to the site, search for some very common tag like "illustration" and try to download the newest entry chronologically (check if it was posted in the last few mins).

The other question is, is there any content that won't be available on the mobile API at all? I haven't encountered anything like that yet but since this whole thing might be because the mobile app does some additional filtering due to appstore requirements then it can't be discounted.

For now I think a temporary solution would be to catch these cases when an invalid image is returned (easy from the URL) and either error or try to wait 5-10-15 mins like in the case of rate limits. If the lag between the site and image availability in the API remains low then this might be enough, maybe along with some informational message in these cases.

Ultimately if the time lag between the mobile API and the site keeps randomly increasing/decreasing or the mobile API becomes filtered in some other way then a switch to the non-mobile API (the one the website uses) might be needed.

Dartkun commented 1 year ago

Still happening on my end. Also only for basically brand new pictures.

alleneko commented 1 year ago

Happens for me when I'm downloading from my bookmarks, but doesn't happen when I use the search page or individual posts.

mikf commented 1 year ago

https://s.pximg.net/common/images/limit_sanity_level_360.png images now get ignored (https://github.com/mikf/gallery-dl/commit/a45a17ddb7504541907772ac330ff278a0f20878 (yes, that's the wrong issue number ...))

To manually ignore them, enable url-metadata and --filter them that way.

The other question is, is there any content that won't be available on the mobile API at all?

I've noticed that search results, and only those, do not include R-18G works.

thatfuckingbird commented 1 year ago

I've noticed that search results, and only those, do not include R-18G works.

Works for me, it might be your account settings (there is a separate toggle for r18g iirc)

mikf commented 1 year ago

These settings are enabled for all of my accounts. It is working again, but it definitely wasn't when I posted https://github.com/mikf/gallery-dl/issues/4327#issuecomment-1646604638.

Are these "Work cannot be displayed" images still a thing or did Pixiv somehow fix whatever these were meant for?

(I've never encountered one of these or a "Skipping 'sanity_level' warning" logging message myself)

thatfuckingbird commented 1 year ago

There's still at least a few minutes of lag before images displayed on the website also appear in the app, so if you happen to download very recent image URLs those will still produce the sanity_level image. I think we will just have to live with this for the time being, since it probably not worth a rewrite to switch to the API that the website uses.

mikf commented 1 year ago

Yeah, I'd really want to avoid using the website API if at all possible. It is a lot slower, requires an extra request for each individual post, and, more importantly, would need exported cookies for authentication, which expire in a month or so.

I did try to rewrite the current extractor back when auth with username and password got disabled and it wasn't a "pleasant" experience, to say the least.

AlttiRi commented 1 year ago

Just a short resume of https://github.com/mikf/gallery-dl/issues/4421#issuecomment-1689864602

In Pixiv's Android application, and therefore in gallery-dl too:

AlttiRi commented 1 year ago

Seems, the caption "bug" is "fixed". But some images are still with "visible": False, gallery-dl does not see them when it downloads a profile's images.


Upd 2023.10.08: The "bug" was returned.

thatfuckingbird commented 1 year ago

Encountered another image that won't download (giving skip sanity_level warning in the log), (NSFW warning) https://www.pixiv.net/en/artworks/109487939 . Interesting because none of the artist's other works seem to be affected and by pixiv standards it's rather tame too.

Sherman-Liu commented 11 months ago

Same issue here: https://github.com/mikf/gallery-dl/issues/4760#issuecomment-1869663862

akinokonomi commented 10 months ago

Seeing Skipping 'sanity_level' warning too. Not nsfw https://www.pixiv.net/en/artworks/102932581

gallery-dl seems to silently skip it, maybe add more explicit error/warning?

I only noticed it was being skipped, after passing --verbose argument.

thatfuckingbird commented 10 months ago

Might be a good idea to add these post URLs to the output of --write-unsupported.

espressoelf commented 9 months ago

The best solution would be falling back to a secondary extractor that doesn't use Pixiv's mobile API. It's like @\thatfuckingbird pointed out: Pixiv is taking measures to keep their mobile apps in the stores. Unfortunately, the automatic flagging is rather triggerhappy, producing many false positives. There also seems to be no publicly visible indicator or any way to appeal the flag from what I saw, so finding a way around is very important for every data hoarder.

AlttiRi commented 9 months ago

I think it makes sense to add a support to use web API additionally to the Android app's API.

Since mobile API does not return shadow banned artworks it would require to use an extra call to get all artworks IDs with site's API:

Object.keys((await (await fetch("https://www.pixiv.net/ajax/user/1657441/profile/all?lang=en")).json()).body.illusts)

So, you can find the missed artworks.

To get the info for them:

(await (await fetch("https://www.pixiv.net/ajax/illust/113897896?lang=en")).json()).body

For ugoira, also:

(await (await fetch("https://www.pixiv.net/ajax/illust/113897896/ugoira_meta?lang=en")).json()).body

However, it seems it's not possible to detect when the caption is removed (in app API) due to a soft shadow ban, or just the author did not add it.

For example: https://www.pixiv.net/en/artworks/103983466 is visible, but it have no caption. "Soft shadow banned".

While these https://www.pixiv.net/en/artworks/102932581, https://www.pixiv.net/en/artworks/109211067 are additionally hidden from the profiles. Can't be downloaded with gallery-dl now (it returns response with visible: False). "Shadow banned".

So, it needs to use the site's API each time when caption is empty, even while the artwork is not shadow banned, if you need the description for meta files.


~Also, site's API returns description with links are wrapped into <a href="/jump.php?....~ There is extraData.meta.twitter.description.


JS code to collect all infos from https://www.pixiv.net/en/users/123456 page:


const headers = {
    // "user-agent": `...`,
    // "cookie": `...`,
};

const profileId = document.location.pathname.match(/(?<=users\/)\d+/)[0]; // https://www.pixiv.net/en/users/7386235

const ids = Object.keys((await (await fetch(`https://www.pixiv.net/ajax/user/${profileId}/profile/all?lang=en`, {
    headers: {
        "referer": `https://www.pixiv.net/en/users/${profileId}`,
        ...headers
    }
})).json()).body.illusts);

const json = {};
for (const id of ids) {
    const body = (await (await fetch(`https://www.pixiv.net/ajax/illust/${id}?lang=en`, {
        headers: {
            "referer": `https://www.pixiv.net/en/artworks/${id}`,
            ...headers,
        }
    })).json()).body;
    json[id] = body;
}

downloadBlob(new Blob([JSON.stringify(json, null, " ")]), `[pixiv][json] ${profileId}—${json[ids[0]]?.userName} (${ids.length}).json`, document.location);

function downloadBlob(blob, name, url) {
    const anchor = document.createElement("a");
    anchor.setAttribute("download", name || "");
    const blobUrl = URL.createObjectURL(blob);
    anchor.href = blobUrl + (url ? ("#" + url) : "");
    anchor.click();
    setTimeout(() => URL.revokeObjectURL(blobUrl), 3000);
}
AlttiRi commented 9 months ago

It is a lot slower, requires an extra request for each individual post.

Optional mixed mode:

and, more importantly, would need exported cookies for authentication, which expire in a month or so.

It's the less problem than the missed images/descriptions (that may contain useful links).


Using an other API endpoints seems very simple, however, they return the JSON data is formatted a bit different way, as I see.

ChromeGames923 commented 3 months ago

@AlttiRi Is there a way for us to manually implement this in the meantime? I think what you propose makes the most sense, which is to keep the current default behavior, but if anything is missing or if there is an error thrown (eg 'sanity_level' warning) then the web API should take over. I've found pixiv to be extremely inconsistent with their application of 'sanity_level' labels and it would be of great use to not be obstructed by it. I'm not sure how difficult of an addition this would be, or if there are any other tools out there that avoid it, but until it is bypassed I wonder what can be done temporarily to preserve the functionality.

AlttiRi commented 3 months ago

I only explained how it should be implemented in Python code in pixiv.py. It requires someone who knows Python to spend some hours to implement it.

mikf commented 1 month ago

The first step towards a complete workaround is done: https://github.com/mikf/gallery-dl/commit/c5be50fdaad5209eb193111d8a4caf897ebb28d0. Now it is at least possible to download limit_sanity_level works via https://www.pixiv.net/en/artworks/12345 URLs.

mikf commented 1 month ago

@AlttiRi https://github.com/mikf/gallery-dl/commit/e05b9b101e31f3d2f3a9b46aec1e12131913c5cb

mikf commented 1 month ago

Detecting limit_sanity_level works for /users/ID/artworks results is now possible: https://github.com/mikf/gallery-dl/commit/75674944f0f030faf0cb61ee2d49a957570edbce. It needs a PHPSESSID cookie to be able to detect R-18 works, though.

@AlttiRi With https://github.com/mikf/gallery-dl/commit/33161da1210ac885177aa6b04f29a53127001f5a, you now need to enable captions to check empty captions via web API.