mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
10.76k stars 886 forks source link

[kemono] Add the additional field `type`: `attachment`, `file` (`a`, `f`) #1556

Closed TestPolygon closed 3 years ago

TestPolygon commented 3 years ago

I have already posted about this problem here: https://github.com/mikf/gallery-dl/issues/1514#issuecomment-830406051

The resume

Currently the program has the unsafe behavior from the box.

For example:

https://data.kemono.party/files/fanbox/{user}/{id}/Untitled.jpe            downloaded (200 KB preview)
https://data.kemono.party/attachments/fanbox/{user}/{id}/Untitled.jpe      NOT downloaded (2 MB file)
https://data.kemono.party/attachments/fanbox/{user}/{id}/Untitled_1.jpe    downloaded (2 MB file)
https://data.kemono.party/attachments/fanbox/{user}/{id}/Untitled_2.jpe    downloaded (2 MB file)

Also the problem happens with using --download-archive option. Even you use {num} as workaround for the example above. https://github.com/mikf/gallery-dl/issues/1514#issuecomment-841392622

The decision

Add the additional field type which can be attachment or file.


I think it makes sense to add the additional field type: attachment, file. And short aliases (type-alias): a and f to use them in filename.

For example: "{id}_{title}_{type-alias}_{filename}.{extension}" or "{id}_{title}_{filename}_{type-alias}.{extension}"


And the same change should be done with --download-archive. The DB entries shoud be now look so: "{type-alias}_{the_current_row_format}".

So, if the entry has no type-alias prefix (it's the row that was created before this supposed update) it shoud be consided that it is a preview f (file), since they are placed before the original file with type a (attachment).

With this change people that used --download-archive can just easily download only the missing files. No need to redownload all files.

TestPolygon commented 3 years ago

Technically there is also the type embed (e), but I did see no one such file yet.

Based on API response a post can contain 1 file, 1 embed, and multiple attachments.

"attachments": [],
"embed": {},
"file": {},

And technically a file, ~an embed~ and one of attachments can have the same filename.


Please, add also type-alias — the short form of type. (f, a, ~e~) There is no sense to add to the filename so much characters instead of just one to define a file type.

Or is there a way to do it manually? In fact it's just substring(0, 1).


UPD. There are inline (i) files in Patreon's posts also. It is already implemented here. Just a note.


UPD 2. Embed it an embed link (not a file) in Patreon's post:

"embed": {
    "description": "Watch this GIF by ... on Gfycat. Discover more GIFS online at Gfycat.",
    "subject": "Create, Discover and Share GIFs on Gfycat",
    "url": "https://gfycat.com/..."
},
mikf commented 3 years ago

Or is there a way to do it manually? In fact it's just substring(0, 1).

{type[0]} will return the first letter of type

Skyofflad commented 3 years ago

For me it doesn't work {type} returns as None in the filename (as well as {type[0]})

mikf commented 3 years ago

@Skyofflad You need the pre-release/dev snapshot for this. Executables can be found here, instructions for pip are here.

Skyofflad commented 3 years ago

@Skyofflad You need the pre-release/dev snapshot for this. Executables can be found here, instructions for pip are here.

I'm already using the latest executable. Here is the output of -K option:

Keywords for filenames and --filter:
------------------------------------
added
  Sat, 01 May 2021 12:51:21 GMT
attachments[][name]
  dens_2.png
attachments[][path]
  /attachments/797556/50700943/dens_2.png
attachments[][type]
  attachment
category
  kemonoparty
content

date
  2021-05-01 01:49:17
edited
  Sat, 01 May 2021 01:49:17 GMT
extension
  png
file[name]
  dens 2.png
file[path]
  /files/797556/50700943/dens_2.png
file[type]
  file
filename
  dens 2
id
  50700943
num
  1
published
  Sat, 01 May 2021 01:49:17 GMT
service
  patreon
shared_file
  False
subcategory
  post
title
  Densatra
user
  797556
username
  Skygracer

There is no type field. {file[type]} always returns file. {type} and {attachments[type]} return None Also, inline images don't even have these fields.

TestPolygon commented 3 years ago

Yeah, there is no such property in 1.17.5-dev.

mikf commented 3 years ago

Turns out I'm incredibly dumb and managed to add the type field to the wrong dict in https://github.com/mikf/gallery-dl/commit/2b5d80862e194e3ada63b458fb871ad6bf7fad35 and didn't notice because I didn't include a test ... anyway, fixed in https://github.com/mikf/gallery-dl/commit/c0fa5058da1e1c65f3c4676d7ac4aa6694733256

TestPolygon commented 3 years ago

It works fine for me with:

"archive-format": "{service}_{user}_{type[0]}_{id}_{filename}.{extension}",

(Do not forget to add using of {type[0]} in "filename")


As I said earlier, the current default archive_fmt is not proper. https://github.com/mikf/gallery-dl/blob/52052a0e1a203e97e773bba4335a21b811e662e1/gallery_dl/extractor/kemonoparty.py#L24 Since an attachment and a file with the same name within one post are considered as the same file while it can be not true.


The same thing is with filename: https://github.com/mikf/gallery-dl/blob/52052a0e1a203e97e773bba4335a21b811e662e1/gallery_dl/extractor/kemonoparty.py#L23

It requires to add {type[0]}/{type}, or {num}/{num:>02}.