mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.7k stars 953 forks source link

[Patreon][Feature Request] Re-download if size or last-modified date has changed #2099

Open shinji257 opened 2 years ago

shinji257 commented 2 years ago

I don't know how possible this is but I'm currently supporting people on patreon that are opting to update existing files vs creating new posts or adding new files. Is there any way to have it re-download files or images if it appears that they may have been changed?

The only way I can see is to do "skip": "false" but that would universally download all files again which I don't want to do...

AlttiRi commented 2 years ago

I assume that you use --download-archive.

In this case you need to change "archive-format". By default it depends only on the post ID and the content number: https://github.com/mikf/gallery-dl/blob/b315a0ecef5e6f03238e62195ea327ce270308f1/gallery_dl/extractor/patreon.py#L25

For example, with this "archive-format": "{id}_{num}_{size}" in the config file it will also depend on size. So, if the size of the file is changes, it will download it.

I did not test it, but it should work.

However changing of "archive-format" is almost the same thing as wiping of the download archive for the service.

shinji257 commented 2 years ago

Currently I have "archive": "./gallery-dl/.archives/{category}.sqlite3", set globally so I guess the answer to using --download-archive is a yes. I don't mind the change and making the data redundant. I'll have a long next batch as it redownloads almost everything but then later it should (hopefully) be faster. I'll test it and see how it works.

shinji257 commented 2 years ago

This doesn't work as it errors when adding the size key to the format.

PS Z:\> .\gallery-dl.exe -i .\patreon.txt --verbose
[gallery-dl][debug] Version 1.19.3
[gallery-dl][debug] Python 3.7.9 - Windows-10-10.0.22000
[gallery-dl][debug] requests 2.25.1 - urllib3 1.25.11
[1/8] https://www.patreon.com/user/posts?u=30654293
[gallery-dl][debug] Starting DownloadJob for 'https://www.patreon.com/user/posts?u=30654293'
[patreon][debug] Using PatreonCreatorExtractor for 'https://www.patreon.com/user/posts?u=30654293'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.patreon.com:443
[urllib3.connectionpool][debug] https://www.patreon.com:443 "GET /user/posts?u=30654293 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://www.patreon.com:443 "GET /api/posts?include=user,images,attachments,user_defined_tags,campaign,poll.choices,poll.current_user_responses.user,poll.current_user_responses.choice,poll.current_user_responses.poll,access_rules.tier.null&fields%5Bpost%5D=change_visibility_at,comment_count,content,current_user_can_delete,current_user_can_view,current_user_has_liked,embed,image,is_paid,like_count,min_cents_pledged_to_view,post_file,published_at,patron_count,patreon_url,post_type,pledge_url,thumbnail_url,teaser_text,title,upgrade_url,url,was_posted_by_campaign_owner&fields%5Buser%5D=image_url,full_name,url&fields%5Bcampaign%5D=avatar_photo_url,earnings_visibility,is_nsfw,is_monthly,name,url&fields%5Baccess_rule%5D=access_rule_type,amount_cents&sort=-published_at&filter%5Bis_draft%5D=false&filter%5Bcontains_exclusive_posts%5D=true&filter%5Bcampaign_id%5D=5312979&json-api-use-default-includes=false&json-api-version=1.0 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://www.patreon.com:443 "GET /api/user/30654293 HTTP/1.1" 200 None
[patreon][debug] Using download archive './gallery-dl/.archives/patreon.sqlite3'
[patreon][error] An unexpected error occurred: KeyError - 'size'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[patreon][debug]
Traceback (most recent call last):
  File "gallery_dl\job.pyc", line 80, in run
  File "gallery_dl\job.pyc", line 124, in dispatch
  File "gallery_dl\job.pyc", line 226, in handle_url
  File "gallery_dl\util.pyc", line 652, in check
KeyError: 'size'
AlttiRi commented 2 years ago

Yeah, I have checked it now (use -K), there is no size key.

While media files have Content-Length header.

AlttiRi commented 2 years ago

Technically it's possible to do (as a feature request for Patreon), gallery-dl already sometimes uses Last-Modified HTTP header as the date key as far I remember.

So, it's probably possible to add size key.

shinji257 commented 2 years ago

Well let's make this a request then.

AlttiRi commented 2 years ago

As I see, Patreon's media files do not have the Last-Modified header.

However gallery-dl lists some date keys, if one of them changes after some media was replaced, then it's possible to use it as I noted in the first comment.


Yes, here is it: images[][created_at] key. Possibly, it will be changed after the media was replaced.

And gallery-dl also has the key with the media size, it names images[][size_bytes].

I forgot how to use them. Just using "{id}_{num}_{images[][size_bytes]}" will not work as I remember (have seen in someone's issue).

shinji257 commented 2 years ago

Well I'll note that https://github.com/mikf/gallery-dl/issues/1992 seems to cover my request sufficiently so I'll wait and see if that gets implemented or not.

God-damnit-all commented 2 years ago

I think this issue should be revised to be about content-length based comparisons using the compare post-processor.

AlttiRi commented 2 years ago

Anyway the file size should be stored somewhere. In a SQL database. So, one way to resolve it is changing the archive format, which will use contentLength/lastModified/...

God-damnit-all commented 2 years ago

Anyway the file size should be stored somewhere. In a SQL database. So, one way to resolve it is changing the archive format, which will use contentLength/lastModified/...

This is a good idea. #1992 suggests including a content-length metadata field, but having it as part of the download archives will keep the user from running into edge cases where a filename might change even if their template does not.