mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.97k stars 975 forks source link

Ignore files? #2882

Open reyaz006 opened 2 years ago

reyaz006 commented 2 years ago

Example:

I was thinking of some simple ignore list in a text file that could be put into the output folder, containing relative names of the files to not be downloaded on next runs. So I could manually set up a separate ignore list for each gallery.

Is it possible? If not, any chance to implement this?

rautamiekka commented 2 years ago

The archive option is exactly that. You need to run through all already downloaded galleries to get them written after enabling.

mikf commented 2 years ago

I would also recommend using -A, --abort / "skip": "abort:5" when updating your collection and the source returns its newest files first, which is most likely the case here. Saves a bit of time by not having to go through all files again, most of which you had already downloaded previously.

reyaz006 commented 2 years ago

I'm going to see how archive works then.

But this brings another question. What if some (existing and downloaded) file is updated on server and I want to (a) get the updated file and/or (b) still keep the older file for review/archiving?

rautamiekka commented 2 years ago

Depends on the website.

reyaz006 commented 2 years ago

I assume the website provides file modification dates and sizes on URL requests. Or is it only implemented for specific portals?

rautamiekka commented 2 years ago

What's the website ?

reyaz006 commented 2 years ago

The website is kemono party.

rautamiekka commented 2 years ago

I know nothing about it, so I'll leave this to everyone else.

a84r7a3rga76fg commented 2 years ago

I'm going to see how archive works then.

But this brings another question. What if some (existing and downloaded) file is updated on server and I want to (a) get the updated file and/or (b) still keep the older file for review/archiving?

I'm having this issue too. For now, I've {_now} in my config. It adds a creation timestamp after kemono.party's number order, this makes it a lot easier to see which post {id} has been updated, simply see if you've two files with the same number in the folder. Also, use a separate archive for kemono.party and change that archive format to hash "archive-format": "{service}_{user}_{id}_{hash}".

"directory": [
"{service} kemono.party {user[:100]}",
"{id} [{date!s:.10}]"
],
"filename": "{num:>03} {_now!s:.16} {filename[:25]}.{extension}"
biggestsonicfan commented 2 years ago

But this brings another question. What if some (existing and downloaded) file is updated on server and I want to (a) get the updated file and/or (b) still keep the older file for review/archiving?

Jumping in here to state the filename field for Kemonoparty has changed at some point, which I have only recently been aware of and reference in #2603

My archive-format is "{service}_{user}_{id}_{num}" but I can see that {hash} might prove more useful.

mikf commented 2 years ago

What if some (existing and downloaded) file is updated on server and I want to (a) get the updated file and/or (b) still keep the older file for review/archiving?

gallery-dl is not really made with downloading a newer version of an already downloaded file in mind.

For kemono, you might be able to achieve something to that effect by using the timestamp from {edited} and/or the value from {hash} in filenames/archive-keys.

There is also a compare post processor to potentially enumerate different versions of the same file, but this method is really inefficient.


{date!s:.10}

That's a rather creative way of formatting a datetime value. Turns out this is actually more than twice as fast as the "proper" way ({date:%Y-%m-%d}). Faster still would be {date!s:[:10]}