mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.43k stars 934 forks source link

[feature request] removing completed URLs from input file #4732

Closed a84r7a3rga76fg closed 9 months ago

a84r7a3rga76fg commented 11 months ago

Two feature requests. When the last image in a URL is successfully downloaded, remove it from the input file. If the URL is unavailable, move it from the input file to another file.

biggestsonicfan commented 11 months ago

Honestly that sounds like you'd want to wrap gallery-dl in another application. I feel like input file manipulation falls out of scope.

JSouthGB commented 11 months ago

For the second request, what about the unsupported log?

a84r7a3rga76fg commented 11 months ago

@biggestsonicfan Not at all. My two requests are identical to this if you swap favorites with a local file https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorexhentaifav.

@JSouthGB Exhentai is supported.

biggestsonicfan commented 11 months ago

"Input file" and "removing a favorite" are two very different things. They both share "removing a url", sure, but fundamentally the handling of these things is different.

biggestsonicfan commented 10 months ago

First and foremost, gallery-dl is a gallery download manager. The fact that it has the ability to remove favorites from your account is quite above and beyond in the scope of what the application does. However, what you are asking affects the entire input handling routines and can in fact go horribly wrong if something bad happens. The change/feature you are asking for is not simple in it's implementation, but rather it's much simpler if left to the user to manage, I have coded you a working example:

import argparse, os
from gallery_dl import config, job, output

parser = argparse.ArgumentParser(description='gallery-dl list-manager')
parser.add_argument('-i', dest='input', type=str, help='Input file for gallery-dl to use')

def main():
    args = parser.parse_args()
    urls = []
    if args.input:
        if os.path.isfile(str(args.input)):
            with open(str(args.input), 'r') as url_list:
                urls = url_list.readlines()
            urls = list(map(str.strip, urls))
            while len(urls) > 0:
                current_url = urls.pop(0)
                config.load()
                try:
                    job.DownloadJob(current_url).run()
                except KeyboardInterrupt:
                    print("KeyboardInterrupt (ID: {}) has been caught. Moving on to next url...".format(signal))
                with open(str(args.input), 'w') as rewrite_list:
                    rewrite_list.write("\n".join(str(item) for item in urls))
            print("Finished")
        else:
            print("Error: Input file not valid")
    else:
        parser.print_help()

if __name__ == "__main__":
    main()

You can use this as a standalone python script in conjunction with gallery-dl and it will remove urls from your input file as they are completed. But please understand gallery-dl is an application, not a management tool for how you manage to download your galleries. That is entirely up to the user.

Hrxn commented 10 months ago

@biggestsonicfan Not sure, I don't want to stick out my neck here when it comes to Python, but don't you arrive at rewrite_list.write() regardless of outcome from DownloadJob()?

In other words, even if the download job fails, it will still remove the URL from the input file? Maybe that write should be wrapped in a check for gallery-dl's exit code?

biggestsonicfan commented 10 months ago

@Hrxn You are correct. This is an example I coded in minutes. You are free to modify the code as you want. I am not going to write conditions for every possible scenario here.

a84r7a3rga76fg commented 10 months ago

gallery-dl is a gallery download manager understand gallery-dl is an application, not a management tool for how you manage to download

What?

biggestsonicfan commented 10 months ago

What?

gallery-dl:

a84r7a3rga76fg commented 10 months ago

I wouldn't have requested the feature if it could. You're also wrong about it not being a management tool.

biggestsonicfan commented 10 months ago

If gallery-dl were a management tool in the way you believe it to be, it would be able to list the galleries you are tracking and tell you which ones are up to date or need updating, and input file wouldn't even be necessary.

From the README:

gallery-dl is a command-line program to download image galleries and collections from several image hosting sites

Nowhere here does it say it manages those images, galleries, or collections. That is up to the user to curate.

What you're seeking is jdownloader2, which has support for exhentai.

a84r7a3rga76fg commented 10 months ago

Nice false equivalence, those got nothing to do with my request. It's hilarious how my basic request got you this livid when it's the same as this Exhentai favorite setting. You're also wrong about gallery-dl not allowing the user to track what's been downloaded.

mikf commented 10 months ago

It's hilarious how my basic request got you this livid

Nice projection.

Anyway, I'll consider adding a feature like this. I've already heard others complaining countless times on other platforms that gallery-dl does not have such a feature while better tools like EZE or whatever do, so I already have some sort of plan on how to implement this.

I still think biggestsonicfan is very much correct in that if you want something, you should first consider doing it yourself (e.g. implementing a feature) before relying on others to do your bidding.

a84r7a3rga76fg commented 10 months ago

It can't be eze since it's a userscript for browsers, it doesn't use an input file.

If you're considering the second request and need something to work with, here are two different types of unavailable galleries, https://exhentai.org/g/2712838/a204b47b06/ (hit by copyright strike) and https://exhentai.org/g/2645268/a69c3b9bc1/ (need the appropriate account or network status to access this).

biggestsonicfan commented 10 months ago

It's hilarious how my basic request got you this livid when it's the same as this Exhentai favorite setting.

I'm not livid at all. I'm just pointing out that you seemingly want gallery-dl to do something it shouldn't need to do, especially with an exhentai list as once an image limit is reached, the next url will also hit the image limit creating an entirely new list.

And again, using web services/api to remove a favorite from your account is not at all the same thing as removing data from your disk and writing/overwriting a file.

You're also wrong about gallery-dl not allowing the user to track what's been downloaded.

I said nothing about the archives. Archives are used to compare live data against a known set of data to avoid duplication, this does not curate the configuration of the archives to suit an individuals needs, that's entirely up to the user to configure in the config file. Some people may want duplicate hashes others may not. gallery-dl does not tell you how up to date a user's gallery is, it does not tell you how many images are in a user's gallery, and it certainly does not have a feature where you tell it to update galleries based on previously downloaded content.

Again, due diligence is needed on the user's part to keep track of what is and isn't downloaded, and that includes setting up your own archives. If your archives are set up correctly, running your input file unmodified will automatically skip what's already been downloaded, making the need for removing urls from a list trivial by only a few minutes. But yes, even using an archive with exhentai is known to use up your image limit.

I am unsubscribing from this issue and wish for what's best for gallery-dl, whatever happens happens and I've said my piece.

mikf commented 10 months ago

Updating your input file can now be done with -I/--input-file-comment or -x/--input-file-delete (https://github.com/mikf/gallery-dl/commit/4700051562060ac1e7ceccaae1d3dfa0832713f8). (There might be bugs, so no guarantees that it doesn't accidentally eat your input file.)

As for your second request: I don't really want to deal with that, at least not now.