marklieberman / downloadstar

Download all items in a webpage that match a pattern
GNU General Public License v3.0
90 stars 16 forks source link

Better way to handle existing files when conflict strategy is skip? #24

Closed marklieberman closed 6 years ago

marklieberman commented 6 years ago

Currently the addon just remembers the last 1000 download URLs and filenames. I never liked this, but there is no API to check if a file exists.

I was thinking of another way to detect duplicates. 1) Allow the download to begin with FilenameConflictAction = "uniquify" and filename fileX.ext. 2) In downloads.onCreated, if the filename is fileX(Y).ext, then cancel the download.

Pro: Don't need to store any download history. Con: Uses a little bit more resources as the download begins but is cancelled.

OkanEsen commented 6 years ago

That sounds reasonable, the only problem I personally would have and I don't know about the utilities from others is that, my files are automatically get sorted into directories based on folder so this strategy wouldn't actually work for me but for most people this should actually work.

marklieberman commented 6 years ago

Can you elaborate on why it won't work for you? Are you using an external tool that monitors the downloads directory? Even so, it shouldn't pick up the file unless Firefox is done, and the partial files should get cleaned up automatically if the download is cancelled. I can build the feature in a branch and you could test it out too.

OkanEsen commented 6 years ago

Sorry for the late response, had a lot of stuff to do.

Are you using an external tool that monitors the downloads directory?

Yes, exactly, I'm using an external tool which is monitoring the Downloads folder and sorting the downloaded files by type and further into directories based on the date. So the extension or rather Firefox wouldn't be able to tell, whether the file was already available or not, since the location was already altered by the time, the download started.

As I already said though, this is really something specific to me and wouldn't be huge of a deal for me, since I'm running a tool to do some deduplication on files, so downloading them again wouldn't make a huge difference to me, personally.

It made me think though and maybe there are other people using the extension like me too. I'm doing some periodically dumps on some images on specific pages on different times. Since I can't keep track of all files I already downloaded, I download them again and let the extension do the check, whether the file I try to download was already downloaded.

As I understand, this change in the way the extension handles the history would only work on two instances:

marklieberman commented 6 years ago

Maybe I can just use both techniques for maximum catching of duplicates. Then it doesn't break any use case. I could also add a option to disable storing download history too.

OkanEsen commented 6 years ago

That sounds pretty good but I would propose this if you don't mind:

  1. First check the history, if we already downloaded the file.
  2. And then check for duplicate by checking for filename.

Doing both of the checks at the same time wouldn't bring much to the table except causing a bit more overhead.

And also, since you think about implementing an option to disable the history altogether, maybe you could also provide the user with an option to specify the history size? I could implement the latter just need to know, how you would like to organize the location for the settings.

marklieberman commented 6 years ago

How does this look?

image

OkanEsen commented 6 years ago

Looks great!