Closed pxssy closed 1 year ago
Well, I also haven't managed to find anything yet, except another issue when downloading from that user's timeline:
$ gallery-dl https://bcy.net/u/109282764041
[downloader.http][warning] '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/3ccdff22479c4060aadc86718209b281'
[download][error] Failed to download 6780546160802143236 35432115.part
^C
KeyboardInterrupt
That's not exactly the same problem as mentioned before, but the image URLs and metadata from the API endpoint are different than the embedded ones in /item/detail/
web pages, and rather incomplete:
https://bcy.net/apiv3/user/selfPosts?uid=109282764041
{
"h": 4032,
"mid": 35432115,
"origin": "",
"original_path": "",
"path": "https://img-bcy-qn.pstatp.com/banciyuan/3ccdff22479c4060aadc86718209b281",
"ratio": 0.6666666666666666,
"type": "image",
"visible_level": "",
"w": 2688
}
https://bcy.net/item/detail/6780546160802143236
{
"h": 4032,
"mid": 35432115,
"origin": "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-logo-v3:wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==.image?sig=XOCQEWBAelmBFHEPfxA8dD5dX2g%3D",
"original_path": "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~noop.image",
"path": "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-w650.image",
"ratio": 0,
"type": "image",
"visible_level": "",
"w": 2688
}
This can probably be solved by adding the watermark or "noop" filter to the path
value, but it just feels bad doing that.
The '404 Not Found'
errors should be fixed with https://github.com/mikf/gallery-dl/commit/8fbbaa54ff3c6c8be46387a31ade010fd64b0fa6, but it's still only capable of downloading watermarked images for these kinds of posts - or the noop version when enabling the noop
option.
This commit also adds a filter
metadata field which is either empty ""
for original images, "watermark"
, or "noop"
, depending on the filter used by bcy.net. You can't use it for directory names, but adding a -watermark
or -noop
to filenames is possible with {filter:?-//}
.
The
'404 Not Found'
errors should be fixed with 8fbbaa5, but it's still only capable of downloading watermarked images for these kinds of posts - or the noop version when enabling thenoop
option.This commit also adds a
filter
metadata field which is either empty""
for original images,"watermark"
, or"noop"
, depending on the filter used by bcy.net. You can't use it for directory names, but adding a-watermark
or-noop
to filenames is possible with{filter:?-//}
.
Some posts aren't downloading properly because the URL's are different for them. So it doesn't download the "original" watermarked version or the "noop" version which is higher quality than what it grabbed. Here's an example:
https://bcy.net/item/detail/6721286314647355660
When I use '-g' it shows the URL as this which doesn't even work when put into a browser: "https://img.bcy-qn.pstatp.com/user/1381845/item/c0rbo/809d946f10eb471989edf8cafc3bb9ea.jpg"
It should be pointed at this for the "noop" version (which I compared and is a higher quality image that is 200kb larger with less compression artifacts): "https://p3-bcy.byteimg.com/img/banciyuan/user/1381845/item/c0rbo/809d946f10eb471989edf8cafc3bb9ea.jpg~noop.image"
And here is the the "original" that isn't even being detected at all by gallery-dl right now: "https://p3-bcy.byteimg.com/img/banciyuan/user/1381845/item/c0rbo/809d946f10eb471989edf8cafc3bb9ea.jpg~tplv-banciyuan-logo-v3:wqnnjovlrZBzYW1l5piv6ICB5aS0ZXIK5Y2K5qyh5YWDIC0gQUNH54ix5aW96ICF56S-5Yy6.image?sig=tB44v3eJxdb-dY9Jl9Ge8A6xIjo%3D"
A few others have "c0qxx" or "c0r67" instead of "c0rbo". The first 3 of this users posts download normally with both "noop" and "watermarked" detected. The last 4 posts do not.
Someone appears to have found a solution that is working. I've somewhat tested myself. personally I don't code but its generating a signature that matches the unwatermarked, original images.
Unfortunately i don't actually understand what's being done, but it'd be great if you could take a look into it and see if whatever is done can be integrated into gallery-dl
Should be fixed with 46b64251 (v1.23.4)
@pxssy This script only links to the low-quality noop
versions of images (original_path
)
Thank you so much for implementing and supporting the site! It was a real hassle to use that site honestly and your downloader really improved the experience.
That being said, i've been getting quite a few errors and i believe its because they changed the format sometime ago. The old posts still uses the old format i mentioned in the other post, but it seems like they have changed it for the new ones
A recent example https://bcy.net/item/detail/6780546160802143236
The display "thumbnail": https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-w650.image
The "original" with watermark: https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~tplv-banciyuan-logo-v3:wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==.image?sig=XOCQEWBAelmBFHEPfxA8dD5dX2g%3D
Seems like the string "~tplv-banciyuan-logo-v3:wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==.image" gives the original image, but makes it come along with a watermark. Surprisingly, "wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug" is actually base64 for the chinese characters on the watermark itself. The watermark includes the poster's name, which makes me believe this is NOT a coincidence. There is a very headache catch though.
The characters on the watermark "©露兒大魔王_ 半次元 - ACE爱好者社区"
Actually maps to (in base64, UTF-8) "wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDReeIseWlveiAheekvuWMugo=" while what's used above in the link is "wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDR-eIseWlveiAheekvuWMug==" Almost exactly the same except that a repeated "e" is replaced with a "-", very strange indeed. Replacing the original with the "correct" string "wqnpnLLlhZLlpKfprZTnjotfCuWNiuasoeWFgyAtIEFDReeIseWlveiAheekvuWMugo=" doesn't work. Seems like its some kind of obfuscation or human error.
I tried replacing the whole string with a base64 encode for a space, ie "ICAg==" or "IA==" Doesn't work. Atm i'm stuck.
The only other template i managed to find is "https://p1-bcy.byteimg.com/img/banciyuan/3ccdff22479c4060aadc86718209b281~noop.image". Its unwatermarked but it seems to be compressed quite a bit, its not exactly the original.
I think we just need to get the template right; the correct "~xxxx" tag for the original unwatermarked.
The downloader gives this output when trying to download said profile.
downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/3ccdff22479c4060aadc86718209b281' download: Failed to download 6780546160802143236 35432115.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/481a06423e3e4969bf129319541c4ab5' download: Failed to download 6780546160802143236 35432116.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/bc46a12d7d5b4f838506c63cdc5a126f' download: Failed to download 6780546160802143236 35432117.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/51936a46c02c49a09dfee28d495eea1c' download: Failed to download 6780546160802143236 35432118.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/a6a61bce98b448abbb1e12e9deb6cb6b' download: Failed to download 6780546160802143236 35432119.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/14dbc38e5bff48688716119d17639520' download: Failed to download 6780546160802143236 35432120.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/a19a6e8fc59c49d28e04b753fb5cb102' download: Failed to download 6780546160802143236 35432121.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/1f52f033ebb74293a244067f975e095c' download: Failed to download 6780546160802143236 35432122.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/a337da495119443aad11145aa1db7d90' download: Failed to download 6778693005793565699 35037961.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/1eb566a4a3854a19beb4cff899cd00a1' download: Failed to download 6778693005793565699 35037962.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/bdb5dae63fa6477fa478b927cbac3236' download: Failed to download 6778693005793565699 35037963.part downloader.http: '404 Not Found' for 'https://img-bcy-qn.pstatp.com/banciyuan/2170770b8fcb4308b3367e31d441e62b' download: Failed to download 6778693005793565699 35037964.part
These are most likely from the downloader using the old technique to handle the new links, which i've tried, does not work.
The current roundabout way to handle this imho is to maybe check if link has an image extension (.jpg/.png), and if it does, implement the old method. If it doesn't then just grab the watermarked originals as well as the "~noop" version mentioned above (until we find a method to remove the watermark from originals), perhaps also place it in separate folders until a final solution can be found. In the mean time i'll manually use the noop version to crop out the watermark from the original.