Open orion486 opened 1 week ago
Doesn't really address the default config issue since it's dependent on the extractor, but with kemono/coomer, the api returns file hashes (iirc SHA256 is used) and can be used as a more specific way to ensure duplicates aren't downloaded like so: "archive-format": "{subcategory}_{user}_{id}_{num}_{hash}"
Yes, I originally thought it may affect more sites, but given that these two websites in question seem to be pretty unique in how they provide multiple revisions of a download target, perhaps addressing this issue is better done on a per-website/extractor basis. I am not sure if other websites use a similar revision system but if they do then a similar solution could be used for their extractor, depending on the info that can be extracted.
And I agree, the file hash for this extractor would be a much better solution to ensure no shared entries in sqlite3. I'll make a new PR.
"filename": "{hash}.{extension}",
"archive-format": "{subcategory}_{user}_{id}_{hash}"
With these you'll only download unique files. Use Kemono's API to sort the files afterwards. There is literally no point in trying to sort files while downloading from Kemono because of how they handle revisions.
The Issue
I am not sure if this can be called a bug but it's a setting that might not produce the intended results. An issue exist where if
extractor.*.skip
istrue
then some files with multiple revisions, such as fromkemonoparty
andcoomerparty
, will not be downloaded ifextractor.*.archive-format
is currently set to the default of"{service}_{user}_{id}_{num}"
; which can be checked using the-E
option.How To Reproduce
For the following URL,
we extract session info using:
If the previously discussed conditions above are set, the object entries with attributes
"filename": "577611769514565632_preview"
and"filename": "577608964548603905"
will both get assigned"num": 1
and thus, only one of these files will be downloaded while the second one in the download order will be skipped since the entry in the sqlite3 archive for both files will be identical due to both files sharing the samenum
value. Both files generate the following entry in the sqlite3 archive in spite of having different filenames:coomerpartyfansly_307507152082186240_577611859612409857_1
.Workarounds
1) Change the default setting of
extractor.*.archive-format
to something more unique, like"{service}_{user}_{id}_{filename}_{extension}_{num}"
. 2) Setextractor.*.skip
tofalse
, (which should have the same(?) effect as using the--no-skip
option). This will download everything again so not the best solution.The first option will break legacy support for previous entries already in the sqlite3 archive. Still, if this behavior is indeed unintended, then the first option is probably the best solution.
Other URLs Also Affected