Open 501stRookie opened 1 year ago
Can confirm, seeing this too. Refreshing the login token doesn't help and it doesn't seem to be related to whether the image is r18 or not. The URL I get for this image is https://s.pximg.net/common/images/limit_sanity_level_360.png The metadata json seems to be mostly fine, but it also contains a "sanity_level": 4, entry.
Update: Tried PixivUtil2, it doesn't have this problem.
Update2: Tried the official Pixiv app, these images also do not show up there. The last displayed images (if I short by "newest") are from yesterday. Except ugoira, those show for some reason.
Also might be related issue here: https://github.com/upbit/pixivpy/issues/275
Update3:
If you check https://www.pixiv.net/info.php?cid=1&lang=en there are announcements about the suspension and reinstantiation of their mobile apps from app stores, related to content in the apps. I think they might be doing some kind of semi-manual filtering now which causes this lag between the mobile app API and the website. This might mean we can no longer use that API, at least for downloading the images themselves. Also for the future, it might be a good idea to detect that limit_sanity_level placeholder image and error on it.
Update4:
The metadata is incomplete too for these images (no tags).
Doesn't appear to be happening on my end.
I ran into this but I can no longer reproduce it, so must've been something temporary.
It is still happening (as I'm writing this), but the lag between what is available on the mobile app API and what's visible on the site has decreased. Currently I see about a 8-10 minute lag until an image shows up on the mobile app (looking at the posting times). You can reproduce this if you go to the site, search for some very common tag like "illustration" and try to download the newest entry chronologically (check if it was posted in the last few mins).
The other question is, is there any content that won't be available on the mobile API at all? I haven't encountered anything like that yet but since this whole thing might be because the mobile app does some additional filtering due to appstore requirements then it can't be discounted.
For now I think a temporary solution would be to catch these cases when an invalid image is returned (easy from the URL) and either error or try to wait 5-10-15 mins like in the case of rate limits. If the lag between the site and image availability in the API remains low then this might be enough, maybe along with some informational message in these cases.
Ultimately if the time lag between the mobile API and the site keeps randomly increasing/decreasing or the mobile API becomes filtered in some other way then a switch to the non-mobile API (the one the website uses) might be needed.
Still happening on my end. Also only for basically brand new pictures.
Happens for me when I'm downloading from my bookmarks, but doesn't happen when I use the search page or individual posts.
https://s.pximg.net/common/images/limit_sanity_level_360.png images now get ignored (https://github.com/mikf/gallery-dl/commit/a45a17ddb7504541907772ac330ff278a0f20878 (yes, that's the wrong issue number ...))
To manually ignore them, enable url-metadata and --filter
them that way.
The other question is, is there any content that won't be available on the mobile API at all?
I've noticed that search results, and only those, do not include R-18G works.
I've noticed that search results, and only those, do not include R-18G works.
Works for me, it might be your account settings (there is a separate toggle for r18g iirc)
These settings are enabled for all of my accounts. It is working again, but it definitely wasn't when I posted https://github.com/mikf/gallery-dl/issues/4327#issuecomment-1646604638.
Are these "Work cannot be displayed" images still a thing or did Pixiv somehow fix whatever these were meant for?
(I've never encountered one of these or a "Skipping 'sanity_level' warning" logging message myself)
There's still at least a few minutes of lag before images displayed on the website also appear in the app, so if you happen to download very recent image URLs those will still produce the sanity_level image. I think we will just have to live with this for the time being, since it probably not worth a rewrite to switch to the API that the website uses.
Yeah, I'd really want to avoid using the website API if at all possible. It is a lot slower, requires an extra request for each individual post, and, more importantly, would need exported cookies for authentication, which expire in a month or so.
I did try to rewrite the current extractor back when auth with username and password got disabled and it wasn't a "pleasant" experience, to say the least.
Just a short resume of https://github.com/mikf/gallery-dl/issues/4421#issuecomment-1689864602
In Pixiv's Android application, and therefore in gallery-dl too:
/v1/user/illusts?user_id=
just does not return any information for them), but they are still listed in bookmarks (/v1/user/bookmarks/illust?user_id=
) with a dummy thumbnail in the app. Can't be downloaded even with an artwork url (/v1/illust/detail?illust_id=
). The endpoint returns a trimmed response with "visible": false
.caption
(description).Seems, the caption "bug" is "fixed".
But some images are still with "visible": False
, gallery-dl does not see them when it downloads a profile's images.
Upd 2023.10.08: The "bug" was returned.
Encountered another image that won't download (giving skip sanity_level warning in the log), (NSFW warning) https://www.pixiv.net/en/artworks/109487939 . Interesting because none of the artist's other works seem to be affected and by pixiv standards it's rather tame too.
Seeing Skipping 'sanity_level' warning
too.
Not nsfw https://www.pixiv.net/en/artworks/102932581
gallery-dl seems to silently skip it, maybe add more explicit error/warning?
I only noticed it was being skipped, after passing --verbose
argument.
Might be a good idea to add these post URLs to the output of --write-unsupported
.
The best solution would be falling back to a secondary extractor that doesn't use Pixiv's mobile API. It's like @\thatfuckingbird pointed out: Pixiv is taking measures to keep their mobile apps in the stores. Unfortunately, the automatic flagging is rather triggerhappy, producing many false positives. There also seems to be no publicly visible indicator or any way to appeal the flag from what I saw, so finding a way around is very important for every data hoarder.
I think it makes sense to add a support to use web API additionally to the Android app's API.
Since mobile API does not return shadow banned artworks it would require to use an extra call to get all artworks IDs with site's API:
Object.keys((await (await fetch("https://www.pixiv.net/ajax/user/1657441/profile/all?lang=en")).json()).body.illusts)
So, you can find the missed artworks.
To get the info for them:
(await (await fetch("https://www.pixiv.net/ajax/illust/113897896?lang=en")).json()).body
For ugoira, also:
(await (await fetch("https://www.pixiv.net/ajax/illust/113897896/ugoira_meta?lang=en")).json()).body
However, it seems it's not possible to detect when the caption is removed (in app API) due to a soft shadow ban, or just the author did not add it.
For example: https://www.pixiv.net/en/artworks/103983466
is visible, but it have no caption
. "Soft shadow banned".
While these https://www.pixiv.net/en/artworks/102932581
, https://www.pixiv.net/en/artworks/109211067
are additionally hidden from the profiles. Can't be downloaded with gallery-dl now (it returns response with visible: False
). "Shadow banned".
So, it needs to use the site's API each time when caption
is empty, even while the artwork is not shadow banned, if you need the description for meta files.
~Also, site's API returns description
with links are wrapped into <a href="/jump.php?...
.~
There is extraData.meta.twitter.description
.
JS code to collect all infos from https://www.pixiv.net/en/users/123456
page:
const headers = {
// "user-agent": `...`,
// "cookie": `...`,
};
const profileId = document.location.pathname.match(/(?<=users\/)\d+/)[0]; // https://www.pixiv.net/en/users/7386235
const ids = Object.keys((await (await fetch(`https://www.pixiv.net/ajax/user/${profileId}/profile/all?lang=en`, {
headers: {
"referer": `https://www.pixiv.net/en/users/${profileId}`,
...headers
}
})).json()).body.illusts);
const json = {};
for (const id of ids) {
const body = (await (await fetch(`https://www.pixiv.net/ajax/illust/${id}?lang=en`, {
headers: {
"referer": `https://www.pixiv.net/en/artworks/${id}`,
...headers,
}
})).json()).body;
json[id] = body;
}
downloadBlob(new Blob([JSON.stringify(json, null, " ")]), `[pixiv][json] ${profileId}—${json[ids[0]]?.userName} (${ids.length}).json`, document.location);
function downloadBlob(blob, name, url) {
const anchor = document.createElement("a");
anchor.setAttribute("download", name || "");
const blobUrl = URL.createObjectURL(blob);
anchor.href = blobUrl + (url ? ("#" + url) : "");
anchor.click();
setTimeout(() => URL.revokeObjectURL(blobUrl), 3000);
}
It is a lot slower, requires an extra request for each individual post.
Optional mixed mode:
/users/
) to check the existence of missed artworks and downloading of them if they exist, /artworks/
) when visible: False
,caption
,and, more importantly, would need exported cookies for authentication, which expire in a month or so.
It's the less problem than the missed images/descriptions (that may contain useful links).
Using an other API endpoints seems very simple, however, they return the JSON data is formatted a bit different way, as I see.
@AlttiRi Is there a way for us to manually implement this in the meantime? I think what you propose makes the most sense, which is to keep the current default behavior, but if anything is missing or if there is an error thrown (eg 'sanity_level' warning) then the web API should take over. I've found pixiv to be extremely inconsistent with their application of 'sanity_level' labels and it would be of great use to not be obstructed by it. I'm not sure how difficult of an addition this would be, or if there are any other tools out there that avoid it, but until it is bypassed I wonder what can be done temporarily to preserve the functionality.
I only explained how it should be implemented in Python code in pixiv.py. It requires someone who knows Python to spend some hours to implement it.
The first step towards a complete workaround is done: https://github.com/mikf/gallery-dl/commit/c5be50fdaad5209eb193111d8a4caf897ebb28d0. Now it is at least possible to download limit_sanity_level
works via https://www.pixiv.net/en/artworks/12345
URLs.
Detecting limit_sanity_level
works for /users/ID/artworks
results is now possible: https://github.com/mikf/gallery-dl/commit/75674944f0f030faf0cb61ee2d49a957570edbce. It needs a PHPSESSID
cookie to be able to detect R-18 works, though.
@AlttiRi With https://github.com/mikf/gallery-dl/commit/33161da1210ac885177aa6b04f29a53127001f5a, you now need to enable captions
to check empty captions via web API.
Starting today, when I tried to download an image from Pixiv instead of downloading the image, it instead downloads this image with Japanese text that says "This work cannot be displayed".
It seems to only happen on posts that were recently posted, as images that were uploaded yesterday and older download fine.