[v2.0] Overview - Planned Features

mikf commented 8 months ago

A rough, incomplete overview of features and changes I want to implement in v2.0: (will be updated as time goes on)

Let me know if I should be more specific on a topic.

increase required Python version to 3.8
- f-strings, walrus operators, etc
- other syntax and library improvements
minimal backwards compatibility
- I really just want to improve/change things without having to worry about breaking other's configs
- there will be an update guide, but expect things to require some manual intervention
config changes (#2203)
better, more consistent names for metadata & options (#1646)
handle files generated by post processors the same as downloaded files
extractor rework (going to open another issue for this soon)
--download-archive rework
- use archive for posts/etc and not just files (#317)
- use highly advanced SQL features like tables
native continuation/cursor and update support
- basically something like Instagram's cursor and skip: abort, but automated
--filter rework
- combine --range, --filter, etc into one unified option
- allow specifying their order (#3500)
- more places where filters are checked
custom enumerations

Possible things that might also happen:

HTTP/2 support - change HTTP library to httpx (#5000)
some form of threading / parallel execution (#31)
a better method of mapping URLs to extractors
- this currently involves a linear search through regular expressions somehow good enough, but everything but efficient

ghbook commented 8 months ago

Very much interested in support of this feature in v2.0. Config file is getting huge due to lack of inheritance between two different categories. Not sure if this is possible with new changes. https://github.com/mikf/gallery-dl/discussions/4632

thatfuckingbird commented 8 months ago

Some thoughts from my POV, building an application relying heavily on gallery-dl:

config changes ([v2.0] Configuration File Changes #2203)

better, more consistent names for metadata & options ([Enhancement] Unify the many keyword naming schemes #1646)

This will hurt, but is understandable. Overall I don't worry too much about it because the metadata files already change sometimes (mostly due to fixes and expansions) and even the optimal configuration changes sometimes, like when new better options are introduced. So sometimes having to update configs is part of the deal already. What I hope is that the ability to specify some options on the command line and some in config files remains (and also using multiple config files like now).

--download-archive rework

use archive for posts/etc and not just files (DeviantartExtractor - check archive sooner #317)

use highly advanced SQL features like tables

native continuation/cursor and update support

I hope this will be done in a backward compatible manner (or at least easy to migrate). I quite like the current simple format of the archive, personally I wouldn't complicate it unless its actually needed for some feature. Also have to be careful on post vs. file distinction, even on sites where these are not the same, posts can be updated later with more images (so recording just posts in the archive would skip some stuff).

The dream would be to have some kind of support for continuing gallery downloads where they were left off (even if lets say the download is aborted due to the system crashing). Unfortunately this could only be achieved with site-specific support in each extractor. Might be something to leave as an optional thing for each extractor, but not even sure its worth the work and complexity on this level.

a better method of mapping URLs to extractors

this currently involves a linear search through regular expressions somehow good enough, but everything but efficient

I think linear search is just fine. What I would like though is to have some way to expand the regex in the configuration so when a new site like fxtwitter, vxtwitter etc. pops up it can be handled on the configuration level instead of updating the extractor code.

+1 wish: please keep the log format the same or at least as nicely parseable as it is now.

Edit: forgot to say but I do like the general direction things are going, looking forward to hearing more esp. about the extractor rework.

mikf / gallery-dl

[v2.0] Overview - Planned Features #5006