mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
12.07k stars 983 forks source link

[kemonoparty] Incosistent `num` variable value #3620

Open zajimumaaa opened 1 year ago

zajimumaaa commented 1 year ago

num variable value is not the same if files config is different.

Case A

The config below have this result:

"filename": "{date!g}_{id}_{num}_{type}.{extension}",
"files": ["attachments", "file", "inline"]

image

with this metadata json:

{
    "added": "Fri, 03 Feb 2023 14:39:19 GMT",
    "attachments": [
        {
            "hash": "e1f42389c76464b289df6aefdc11b1c5f56edd26349c937afdce9b86f3690abf",
            "name": "FGO2_unc.zip",
            "path": "/e1/f4/e1f42389c76464b289df6aefdc11b1c5f56edd26349c937afdce9b86f3690abf.zip",
            "type": "attachment"
        },
        {
            "hash": "cd27498a89cc560127b2d4e240402591bbfccc9d99c05a1766f20a98b178455d",
            "name": "00cv01.jpg",
            "path": "/cd/27/cd27498a89cc560127b2d4e240402591bbfccc9d99c05a1766f20a98b178455d.jpg",
            "type": "attachment"
        },
        {
            "hash": "c9b5ff466726078d7f42390ef14e29b5de6124603432f4577c213ae685885975",
            "name": "09.jpg",
            "path": "/c9/b5/c9b5ff466726078d7f42390ef14e29b5de6124603432f4577c213ae685885975.jpg",
            "type": "attachment"
        },
        {
            "hash": "642f7a723adc59e64d16a9462b725130bd7888906d82ca5c252b58c009cd9595",
            "name": "20.jpg",
            "path": "/64/2f/642f7a723adc59e64d16a9462b725130bd7888906d82ca5c252b58c009cd9595.jpg",
            "type": "attachment"
        },
        {
            "hash": "0e6fdbd8190ab5186b2b4671488d5c85936a68bbbfe369f842371b8809b4670a",
            "name": "22.jpg",
            "path": "/0e/6f/0e6fdbd8190ab5186b2b4671488d5c85936a68bbbfe369f842371b8809b4670a.jpg",
            "type": "attachment"
        },
        {
            "hash": "e64ffa46c7186473db41ac14f75259352efede34cfedd3a9e83dc01a90a3bf1a",
            "name": "26.jpg",
            "path": "/e6/4f/e64ffa46c7186473db41ac14f75259352efede34cfedd3a9e83dc01a90a3bf1a.jpg",
            "type": "attachment"
        }
    ],
    "category": "kemonoparty",
    "content": "<p>Uncensored ver.</p><p><br></p><p>*Please do not upload all pictures to other sites without permission.</p><p>*許可なく他のサイトにアップロードしないでください。</p><p><br></p>",
    "count": 6,
    "date": "2023-02-01 09:00:04",
    "dms": [],
    "edited": null,
    "embed": {},
    "file": {
        "hash": "cd27498a89cc560127b2d4e240402591bbfccc9d99c05a1766f20a98b178455d",
        "name": "00cv01.jpg",
        "path": "/cd/27/cd27498a89cc560127b2d4e240402591bbfccc9d99c05a1766f20a98b178455d.jpg",
        "type": "file"
    },
    "id": "78043899",
    "published": "Wed, 01 Feb 2023 09:00:04 GMT",
    "service": "patreon",
    "shared_file": false,
    "subcategory": "patreon",
    "title": "Fate/Gentle Order 2 uncensored ver.",
    "user": "12281898",
    "username": "MANA"
}

Case B

The config below have this result:

"filename": "{date!g}_{id}_{num}_{type}.{extension}",
"files": ["file", "inline"]

image

as reference, this is the metadata.json:

{
    "added": "Fri, 03 Feb 2023 14:39:19 GMT",
    "attachments": [
        {
            "name": "FGO2_unc.zip",
            "path": "/e1/f4/e1f42389c76464b289df6aefdc11b1c5f56edd26349c937afdce9b86f3690abf.zip"
        },
        {
            "name": "00cv01.jpg",
            "path": "/cd/27/cd27498a89cc560127b2d4e240402591bbfccc9d99c05a1766f20a98b178455d.jpg"
        },
        {
            "name": "09.jpg",
            "path": "/c9/b5/c9b5ff466726078d7f42390ef14e29b5de6124603432f4577c213ae685885975.jpg"
        },
        {
            "name": "20.jpg",
            "path": "/64/2f/642f7a723adc59e64d16a9462b725130bd7888906d82ca5c252b58c009cd9595.jpg"
        },
        {
            "name": "22.jpg",
            "path": "/0e/6f/0e6fdbd8190ab5186b2b4671488d5c85936a68bbbfe369f842371b8809b4670a.jpg"
        },
        {
            "name": "26.jpg",
            "path": "/e6/4f/e64ffa46c7186473db41ac14f75259352efede34cfedd3a9e83dc01a90a3bf1a.jpg"
        }
    ],
    "category": "kemonoparty",
    "content": "<p>Uncensored ver.</p><p><br></p><p>*Please do not upload all pictures to other sites without permission.</p><p>*許可なく他のサイトにアップロードしないでください。</p><p><br></p>",
    "count": 1,
    "date": "2023-02-01 09:00:04",
    "dms": [],
    "edited": null,
    "embed": {},
    "file": {
        "hash": "cd27498a89cc560127b2d4e240402591bbfccc9d99c05a1766f20a98b178455d",
        "name": "00cv01.jpg",
        "path": "/cd/27/cd27498a89cc560127b2d4e240402591bbfccc9d99c05a1766f20a98b178455d.jpg",
        "type": "file"
    },
    "id": "78043899",
    "published": "Wed, 01 Feb 2023 09:00:04 GMT",
    "service": "patreon",
    "shared_file": false,
    "subcategory": "patreon",
    "title": "Fate/Gentle Order 2 uncensored ver.",
    "user": "12281898",
    "username": "MANA"
}

TLDR Comparison

The file num value is 7 in Case A but in Case B it's 1. Would be nice if there's an option to make it have the same result (count the num even if it's not included to download queue)

Additional Notes

Also for some reason some metadata is missing in case B. is that a bug?

Hrxn commented 1 year ago

I don't see the error, to be honest.

You use the "files" option to get attachements first in Case A, and so you have 6 attachments, and then the "normal" file, so the {num} is obviously 7 here, and in Case B you simply ignore all attachments, so the {num} is 1..

I mean, what did you expect?

zajimumaaa commented 1 year ago

I was expecting them to have the exact same num regardless they are included in download queue or not. Sometimes I'm not downloading the attachment when the connection is not stable and then download it on the later run. On current behaviour it's not possible because the num is not consistent. If I do the 2nd run, there will be duplicated files and unintended skipped files because the num is not reproducible.

If I use archive.sqlite instead, it will skip the first download queue eventhough it's a complete different file because the num also not consistent.

with attachment: image the 1 is the first attachment which is a zip file

without attachment: image the 1 is first file which is an image. If I run gallery-dl again with attachment on the config file, the first attachment will be skipped and the first file which has been downloaded will be downloaded again with num 7.

zajimumaaa commented 1 year ago

This probably will happen too if there's inline file.

My request is that we can configure num value to be based on sequence in the metadata instead of download queue so it will persist and reproducible regardless of the files configuration,

ClosedPort22 commented 1 year ago

My request is that we can configure num value to be based on sequence in the metadata instead of download queue so it will persist and reproducible regardless of the files configuration

This can be achieved without modifying the source code. Just change your archive-format to something that reflects the uniqueness of each file, for example:

{service}_{user}_{id}_{hash|filename|num}
zajimumaaa commented 1 year ago

Just change your archive-format to something that reflects the uniqueness of each file, for example:

{service}_{user}_{id}_{hash|filename|num}

Thanks for the workaround.

This can be achieved without modifying the source code.

In my humble opinion, the default archive format should also make sure it's reproducible, isn't that the main functionality of archive sqlite ? to monitor which things has been downloaded or not?

ClosedPort22 commented 1 year ago

In my humble opinion, the default archive format should also make sure it's reproducible, isn't that the main functionality of archive sqlite ? to monitor which things has been downloaded or not?

Yeah, but changing how num works under the hood would break backward compatibility. The usual solution to this is to add another metadata field (pos or seq sounds like a good name), but its function would be largely identical to {hash|filename|num}.

zajimumaaa commented 1 year ago

Yeah, but changing how num works under the hood would break backward compatibility

Yea, make sense.

The usual solution to this is to add another metadata field (pos or seq sounds like a good name), but its function would be largely identical to {hash|filename|num}.

In my opinion, hash is too long (Windows have limitation regarding this if I remember correctly), filename might not be unique and num is not reproducible with difference config. So pos or seq have different purpose.

@mikf , apologize for the ping, could you add pos or seq variable for this purpose? is it possible?

Context: pos or seq is similiar to num but the value won't change if include config is changed. As my observation, num is sequence in the download queue, so if include config is changed, the value will also change because the download queue order also changed. Instead of using download queue order, pos or seq is using order in metadata that contain all of the possible thing that could be downloaded regarding the include config.