raceimaztion / webcomic-downloader

A simple downloader for downloading numerically-ascending webcomic archives for Python and GTK.
0 stars 0 forks source link

Possible way to get all comics with minimal user direction #1

Open rajatkhanduja opened 10 years ago

rajatkhanduja commented 10 years ago

How about asking the user to input not the webcomic name, but the link to the 'archives' page of the comic. This will eliminate all the guess work that we do (eg : incrementing numbers in the URL).

Once you get the page, just create a list of all the unique images on the pages (by extracting just the img tags). Even if all of these images are downloaded, there will be very few false hits (non-comic images such as logos and favicons).

Yet another method would be to ask the user to enter link to 'any' of the pages of the comic, and the script itself searching for the archives page. This is prone to errors as there is no standard that necessitates that the archive will be called 'Archives'. It could be called something like 'List of comics'.

raceimaztion commented 10 years ago

A second method, which tries to follow the 'next comic' links in each day's page, is in the 'collector.py' library.

Basically, it tries to do some fancy guessing (based on some hard internet standards and some general assumptions) on what URLs take us to the next page and which IMGs are the comic image. It isn't perfect, especially as almost every webcomic's pages are constructed by somebody different, and there's a good chance of grabbing extra images, but that's generally not a big problem.

The only problem with that is the fact that it's more complicated, meaning the GUI will take more work to complete, which also means that the GUI isn't even complete. I've been cheating by running the library straight from a Python console...

Anyway, a better option might be to use Regular Expressions to choose the next page and the comic images, though it would be a bit more GUI-intensive. It would require a bit more interaction between the user and the program, especially with choosing the exact parameters for the downloading engine.

The only issue I have with using a comic's Archive page is that there are several different formats for archive pages. Aside from the fact that some formats have preview sizes of the comic pages in them and others don't, there's also the issue where some comics put all the links to the individual comics on one page and others actually put them on multiple pages.

I'm going to try working on finishing the new GUI for the new system, so hopefully it will be useable.