mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.36k stars 925 forks source link

[kemono.party] "404 NOT FOUND" error #1514

Closed Vishvamitra closed 3 years ago

Vishvamitra commented 3 years ago

For the first time, I tried downloading all the posts of a user from kemono party today. I exported the cookies, I placed them in my gallery-dl conf file, but I got the error. You can see the first three lines of the command output below:

gallery-dl "kemono party user profile"

[downloader.http][warning] '404 NOT FOUND' for 'file name'

[download][error] Failed to download 

Do you have any idea on the reason why this could happen? I'm downloading other stuff using gallery-dl in the background. Might this be the reason why the command isn't working?

Vishvamitra commented 3 years ago

I've tried downloading the pics from kemono party even after finishing the former job, but I still get the same error.

kattjevfel commented 3 years ago

Works here, please provide a problematic URL.

Cunabo commented 3 years ago

Links to files on kemono slightly changed: old https://kemono.party/files/..... new https://data.kemono.party/files/..... Sorry for bad English.

zenosiege commented 3 years ago

Links to files on kemono slightly changed: old https://kemono.party/files/..... new https://data.kemono.party/files/..... Sorry for bad English.

when a fix update? please

Vishvamitra commented 3 years ago

@kattjevfel Hello! Here's the verbose output resulting from providing the same link I tried this morning:

[gallery-dl][debug] Version 1.17.3
[gallery-dl][debug] Python 3.6.9 - Linux-5.4.0-72-generic-x86_64-with-Ubuntu-18.04-bionic
[gallery-dl][debug] requests 2.25.1 - urllib3 1.26.4
[gallery-dl][debug] Starting DownloadJob for 'https://kemono.party/patreon/user/6549841'
[kemonoparty][debug] Using KemonopartyUserExtractor for 'https://kemono.party/patreon/user/6549841'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): kemono.party:443
[urllib3.connectionpool][debug] https://kemono.party:443 "GET /api/patreon/user/6549841?o=0 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://kemono.party:443 "GET /files/6549841/48931656/Cheerleaders_023.png HTTP/1.1" 404 None
[downloader.http][warning] '404 NOT FOUND' for 'https://kemono.party/files/6549841/48931656/Cheerleaders_023.png'

@Cunabo Editing the URL by adding data doesn't help, since now the download doesn't even start as the program doesn't recognize the extractor.

Cunabo commented 3 years ago

I don't know how, but I fixed it.

Also, I don’t know how github works and therefore I don’t know how to show my solution to the developer.

And also I'm a noob in python.)

Sorry for bad English.

mikf commented 3 years ago

Fixed in https://github.com/mikf/gallery-dl/commit/4b65ebf6529870d7eb03feaf3e593e53e0f82efe. Get the executable from here or install from source to use this commit.

edit: this change seems to have broken (older?) inline images on the site itself, e.g. https://kemono.party/fanbox/user/7356311/post/802343

zenosiege commented 3 years ago

Fixed in 4b65ebf. Get the executable from here or install from source to use this commit.

edit: this change seems to have broken (older?) inline images on the site itself, e.g. https://kemono.party/fanbox/user/7356311/post/802343

is it me or the executable is broken? Can't unzip it

Vishvamitra commented 3 years ago

@mikf Thank you for the fix! I've just updated to the latest dev version, but how am I supposed to use it instead of the regular version? The gallery-dl --version command still calls the 1.17.3 version.

@zenosiege I was able to extract the Ubuntu zip file with no problems, but I couldn't launch it since an additional python library was missing.

zenosiege commented 3 years ago

I was able to extract the Ubuntu zip file with no problems, but I couldn't launch it since an additional python library was missing.

@Vishvamitra Dunno, I use Windows and I've tried to unzip it with 7-Zip and Explorer. No results. It says there is an error in the zip file

roweger commented 3 years ago

I was able to unzip and get to the .exe, but it didn't seem to install. It displays this, and then closes:

error

Installing from source says 1.17.4.dev0 installed successfully, but the version readout still says 1.17.3, and downloading from Kemono returns the 404 Forbidden errors.

pip install

Any ideas?

kattjevfel commented 3 years ago

You don't "install" gallery-dl.exe, you run it directory in a command prompt. Anyway if installing via pip didn't help either then I don't know.

Vishvamitra commented 3 years ago

Anyway if installing via pip didn't help either then I don't know.

@kattjevfel I can confirm that the dev version gets installed but the program keeps using the latest stable version. Screenshot_20210430_072642

zenosiege commented 3 years ago

I was able to unzip and get to the .exe, but it didn't seem to install. It displays this, and then closes:

@roweger How?

TestPolygon commented 3 years ago

Al least for Fanbox it works with a bug — it can download a preview instead of the original image if the preview and the attachments have the same name.

https://data.kemono.party/files/fanbox/{user}/{id}/Untitled.jpe            downloaded (200 KB preview)
https://data.kemono.party/attachments/fanbox/{user}/{id}/Untitled.jpe      NOT downloaded (2 MB file)
https://data.kemono.party/attachments/fanbox/{user}/{id}/Untitled_1.jpe    downloaded (2 MB file)
https://data.kemono.party/attachments/fanbox/{user}/{id}/Untitled_2.jpe    downloaded (2 MB file)
TestPolygon commented 3 years ago

Is there no option to not overwrite files with the same name? (And keep both files file.png, file (1).png)

As workatound I can use --no-skip to overwrite files (that are downloads first) with attachments (that are follow next in a post).

thatfuckingbird commented 3 years ago

Is there no option to not overwrite files with the same name? (And keep both files file.png, file (1).png)

You can also set skip to "enumerate" to get numbered filenames.

TestPolygon commented 3 years ago

Yeah, it even was already noted in other issues. https://github.com/mikf/gallery-dl/issues/1436#issuecomment-815261156.

It think it should be enabled by default. Currently the program has the unsafe behavior from the box. It would be unpleasant surprice for people when they find out that a part of the downloaded files are previews.


But "skip": "enumerate" shoud be not compatible with --download-archive option I suggest (I did not test it). So, it's not the best decidion.

I think it makes sense to add the additional field type: attachment, file. And short aliases (type-alias): a and f to use them in filename.

For example: "{id}_{title}_{type-alias}_{filename}.{extension}" or "{id}_{title}_{filename}_{type-alia}.{extension}"


But that is the title is too long? UPD: It looks OK, it will be trimmed. UPD2: No, it's just already trimmed by the site down to 60 char + ...

Twi-Hard commented 3 years ago

It's come up several times now that it skips a lot of files because of duplicate names. I really think the default name should be set to "{id}_{title}_{num:>02}_{filename}.{extension}" or "{id}_{title}_{num}_{filename}.{extension}" instead of "{id}_{title}_{filename}.{extension}" A huge amount of the files I try to download have duplicate names. Edit: I'm just mentioning this because it could help helpful for people in the future.. I already have it in my config. Edit2: I personally user "{id}_{title}_{num:>02}_{filename}.{extension}" because there's a lot of posts with 10 or more images.

TestPolygon commented 3 years ago

I think it makes sense to add the additional field type: attachment, file. And short aliases (type-alias): a and f to use them in filename.

And the same change should be done with --download-archive. The DB entries shoud be now look so: {type-alias}_{the_current_row_format}.

So, if the entry has no type-alias prefix (it's the row that was created before this supposed update) it shoud be consided that it is a preview f (file), since they are placed before the original file with type a (attachment).

With this change people that used --download-archive can just easily download only the missing files. No need to redownload all files.

RJFAC commented 3 years ago

1488

Vishvamitra commented 3 years ago

@TestPolygon

At least for Fanbox it works

May I ask how you made it work? The Linux executable version doesn't launch because a python library is missing and installing from source works but for some reason the program ignores the dev version. Am I being dumb or am I missing anything crucial here?

zenosiege commented 3 years ago

image Damn, even online extractors don't help, WTF with the windows ZIP archive?

TestPolygon commented 3 years ago

And another question:

It is possible do no skip posts without media, which contains only a text? In order to save text metadata with --write-metadata or with postprocessors like I described here: https://github.com/mikf/gallery-dl/issues/1505#issuecomment-827106357.

Probably the problem reason is that filename of metadata file relies on media content properties. UPD: No, even using only a post properties: {user}, {id}, {date:%Y.%m.%d}, {title} for "filename" does not help; --no-skip --no-download too.


May I ask how you made it work?

pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz

Vishvamitra commented 3 years ago

UPDATE I think I found the reason why gallery-dl doesn't automatically use the dev version on Linux: it looks like the procedure to install gallery-dl from source places gallery-dl inside a folder which normally isn't located in PATH. The folder is /home/user/.local/bin. Unless you add the former folder in your PATH, you need to write the whole file path to execute the dev version. See here: Screenshot_2021-05-01_11-25-08

A question is arising in my mind now: does the dev version require a different config file? Or will it read the same config file as the stable version, which has been installed through pip with the python3 -m pip install -U gallery-dl command?

TestPolygon commented 3 years ago

In Windows you need only install Python with the set checkbox "Add Python to PATH": image And after run

in a console (CMD or Git Bash).

That's all. You can use it in a console now. Type gallery-dl --version to check it.

Do not forget to open the console in the download folder:

There is only one default config for all program instances.

zenosiege commented 3 years ago

btw, just tried the dev python version image it still saying "404 NOT FOUND"

UPD. Maybe I'm not a super pythonist, but I just checked kemonoparty.py file in "extractor" folder. Even with "data." it doesn't work UPD. 2 XD LOL I'M SO DUMB. Gonna make a note: never use a prepared batch file with the executable in one folder. It works fine, thanks!

Vishvamitra commented 3 years ago

I think that I need your help again. I really can't install the dev version on my system. Case 1: if I execute the python3 -m pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz command, as @mikf suggests, the installation of the dev package is smooth but the dev version is nowhere to be found in my system. I even checked the .local/bin folder but I already have the 1.17.3 version of gallery-dl there so where am I supposed to look for the dev version? Case 2. if I execute the pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz command, as @TestPolygon says, I get the following error message: DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality. In this latest case, the installation obviously fails.

Please, could anyone give me a hint on the way out of this?

TestPolygon commented 3 years ago

I get the following error message:

DEPRECATION: Python 2.7 reached the end of its life Please upgrade your Python

image

Vishvamitra commented 3 years ago

After upgrading to Kubuntu 20.04 I was finally able to install and use the 1.17.4-dev version of gallery-dl, but I got a new error when I tried to download the same user profile from Kemono party. I'll upload a screenshot of my terminal below: Screenshot_20210502_215622

TestPolygon commented 3 years ago

The server wants to relax.

Vishvamitra commented 3 years ago

Am I the only one experiencing issues again to download files from kemono party? I keep getting these errors and the download of every file repeatedly fails. Screenshot_20210503_142329

TestPolygon commented 3 years ago

If the program can't do something that you can't do manually it's not the program's problem.


Check does it work in a browser. Do not confuse previews with the original files, by the way.

Don't forget about cookies.

The official chat: https://t.me/kemonoparty

Vishvamitra commented 3 years ago

@TestPolygon Did I imply that I was blaming the program? It would be nice of you if you could give me a hint of the solution.

TestPolygon commented 3 years ago

The server wants to relax.

If the program can't do something that you can't do manually it's not the program's problem.

Okay, one time more: image

The data server does not work. - C.O.

Read the related to the service chat, board.

Vishvamitra commented 3 years ago

@TestPolygon Ok thank you for the clarification.

Diadial commented 3 years ago

@TestPolygon Ok thank you for the clarification.

In regards to connection issues, it's not just you. The owner of KemonoParty recently upgraded the servers and implemented a DDOS protection. I imagine they or the sysadmin are configuring the site still. That or they've gone over the bandwidth limit.

Terrails commented 3 years ago

It's come up several times now that it skips a lot of files because of duplicate names. I really think the default name should be set to "{id}_{title}_{num:>02}_{filename}.{extension}" or "{id}_{title}_{num}_{filename}.{extension}" instead of "{id}_{title}_{filename}.{extension}" A huge amount of the files I try to download have duplicate names. Edit: I'm just mentioning this because it could help helpful for people in the future.. I already have it in my config. Edit2: I personally user "{id}_{title}_{num:>02}_{filename}.{extension}" because there's a lot of posts with 10 or more images.

I've been using a filename containing "{num}" since I started using gallery-dl for kemono, but it still stands that gallery-dl downloads the file located at "/files/{service}/{user}/{id}/FileName.{extension}" and then skips the file that has exactly the same name located at "/attachments/{service}/{user}/{id}/FileName.{extension}" instead of downloading both and giving them the proper {num}. My downloaded gallery is currently missing the majority of files with {num}: 2 because the files with {num}: 1 have the same name as {num}: 2 which results in "duplicates" being skipped.

The solution that @TestPolygon suggested should hopefully give us a simple solution, as the root of the issue seems to be the archive because all files, including the so called "duplicates", download normally when not using archive. Can anyone also confirm this?

I think it makes sense to add the additional field type: attachment, file. And short aliases (type-alias): a and f to use them in filename.

And the same change should be done with --download-archive. The DB entries shoud be now look so: {type-alias}_{the_current_row_format}.

So, if the entry has no type-alias prefix (it's the row that was created before this supposed update) it shoud be consided that it is a preview f (file), since they are placed before the original file with type a (attachment).

With this change people that used --download-archive can just easily download only the missing files. No need to redownload all files.

TestPolygon commented 3 years ago

Since you use {num}/{num:>02} I think it can be fixed with changing also archive-format which default is: https://github.com/mikf/gallery-dl/blob/bc868e7bb8ad82d9d2bef7a609aa2deb50fca647/gallery_dl/extractor/kemonoparty.py#L24

to, for example:

 "archive-format": "{service}_{user}_{id}_{num:>02}_{filename}.{extension}"

But this change only makes sense to do only if you did not download something yet. Because old entries with the different format will be ignored after this change.


I have created a separate issue for it: https://github.com/mikf/gallery-dl/issues/1556

TestPolygon commented 3 years ago

Kemono 403 Forbidden

Use the search.

https://github.com/mikf/gallery-dl/issues/1370

mikf commented 3 years ago

I've (finally) changed the default filenames and archive IDs to avoid duplicate names by using an enumeration index ({num}): https://github.com/mikf/gallery-dl/commit/e9ab97396fb0bf9159644ae5278e2866c0321816. As the commit message says, this breaks backwards compatibility in the sense that any previous generated archive IDs or filenames won't get recognized as "already downloaded" anymore. It is possible to keep using the old names and IDs (see https://github.com/mikf/gallery-dl/commit/e9ab97396fb0bf9159644ae5278e2866c0321816), but that obviously has the problem of potentially skipping downloads. You could also use the {type} field from #1556 instead of {num} if that suits you better.

TestPolygon commented 3 years ago

It's of course better that it was. But using of {num} will lead to downloading more count of unnecessary duplicates, since come artists have the notable count of posts have duplicated of URLs within one post — when the same URL is counted twice in a post. While with using {type}, or{type[0]} you will download only unique URLs of attachments (but not inline — they will be with duplicates) within a post. (Although it applies only to some rare artists. For example, patreon of incognit has 1403 attachment URLs while only 949 URLs are unique) Also it's more convenient to understand which type media is: inline, file, or attachment(s).

left1000 commented 3 years ago

Uh, so I just grabbed https://github.com/mikf/gallery-dl/releases/tag/v1.18.0 do I need to update my gallery-dl.conf file? Or just download this new exe?

edit: In case it's not clear, I want to use the new default naming options, but I know naming options are usually controlled in the gallery-dl.conf file?

TestPolygon commented 3 years ago

If you want to use the new default format: https://github.com/mikf/gallery-dl/blob/ee1064a2b29d8097d09b8d178ccb61583acf7f80/gallery_dl/extractor/kemonoparty.py#L23-L24 just remove "filename" and "archive-format" from the config file if you have added them.


The alternative format with the first letter of {type}:

"kemonoparty": {
  "filename": "{id}_{title}_{type[0]}_{filename}.{extension}",
  "archive-format": "{service}_{user}_{id}_{type[0]}_{filename}.{extension}"
}