Open tezrilet opened 2 months ago
Remove the title from the file name. Download the post's unique files only. Save the metadata of the post from Kemono and use it to sort the files with symbolic or hard links to not waste any storage space. Replace creator_id and post_id in the URL with the correct ID of the creator and post. Trying to sort files from Kemono without wasting space will only lead to frustration.
https://kemono.su/api/v1/service/user/creator_id/post/post_id
"archive-format": "{subcategory}_{user}_{id}_{hash}",
"archive": "~/gallery-dl/archives/kemono/{subcategory} kemono {user}.sqlite",
"directory": ["{subcategory} kemono {user}", "{date!s:.10} {id}"],
"filename": "{hash}.{extension}"
Thanks, but
(aside from obviously not using date/title)
Yes, I'm already saving unique files using their hash. However, your suggestion still uses date
, so it'll still create duplicates. I appreciate the suggestion, though! It's just that it'd be nice to navigate things with a file browser and search while using meaningful paths with a title. I don't mind if it has to make an extra request to get the earliest revision date/title, since I'm already grabbing them all anyway.
The suggestion above does not use date
in the archive-format
, though?
I was referring to the directory option ("directory": ["{subcategory} kemono {user}", "{date!s:.10} {id}"],
), but the problem still occurs because the date changes between revisions. I tried it, and while it does prevent duplicating the files, I still end up with multiple folders. Ideally, I want to use a fixed value for the date and title, such as the earliest ones available from the first revision. I can provide a list of URLs privately if needed.
@mikf Bumping since an admin announced that Kemono is shutting down on November 22nd. Since this issue never got a label, is it considered a won't do/out of scope?
You might as well consider this "won't fix" then, as there is a good chance the next release will be after 2024.11.22.
If there currently isn't a way to solve this (aside from obviously not using date/title), could we get some extra options to use with the format strings, such as
{earliest_revision_date}
,{latest_revision_date}
,{earliest_revision_title}
, and{latest_revision_title}
?
Each revision has 4 metadata fields:
revision_id
revision_index
revision_count
revision_hash
The earliest revision entry has a revision_index
value of 1
, the latest a revision_index
== revision_count
and its revision_id
is 0
.
Using conditional file/directory names, you could do do something like
"directory": {
"revision_index == 1" : ["{username}", "{service}", "[{id}]", "earliest revision: {title}"],
"revision_index == revision_count": ["{username}", "{service}", "[{id}]", "latest revision: {title}"],
"": ["{username}", "{service}", "[{id}]", "{revision_id}: {title}"]
}
Thanks for the suggestion, though that still creates duplicate files. I think what I may end up having to do is to only use the post ID as the folder name, then write a script to rename them properly after downloading.
though that still creates duplicate files
It won't if you use an archive with {hash}
as archive-format, as suggested by https://github.com/mikf/gallery-dl/issues/6096#issuecomment-2313905785
I might try that instead, now that I think about it. The majority of content shouldn't have that many revisions, so fixing the few duplicate folders might be easier.
In regards to the comment I just left, https://github.com/mikf/gallery-dl/issues/6415#issuecomment-2453554698, would this still be a viable feature to add?
I'm currently downloading all post revisions and organizing them with the following directory structure:
However, both the post's
date
andtitle
could change between revisions. I can send example URLs privately for both situations, if needed. This creates a duplicate folder, and could potentially eat up space with all the content being redownloaded. I have also tried using{published[:10]}
instead of date, but in later revisions it can be null, so it duplicates using "None". Though, that still wouldn't address title changes.If there currently isn't a way to solve this (aside from obviously not using date/title), could we get some extra options to use with the format strings, such as
{earliest_revision_date}
,{latest_revision_date}
,{earliest_revision_title}
, and{latest_revision_title}
?