webcomics / dosage

dosage is a comic strip downloader and archiver
https://dosage.rocks/
MIT License
122 stars 59 forks source link

Oglaf - french version - issue with setting up the prevSearch parameter #324

Open brunellejb opened 1 month ago

brunellejb commented 1 month ago

Hi, I'm trying to add a script to manage the addition of Oglaf comic in french version on my local dosage, but I'm having an issue with setting it up (not used to python and haven't coded anything in almost 15 years...)

I'm using dosage.exe 3.0 (standard version, not legacy) Using Python 3.10.4 (CPython) on Windows-10-10.0.19045-SP0

I created the following oglaffrench.py script in my user plugin directory :

`from ..scraper import ParserScraper

class oglaffrench(ParserScraper): url = 'https://oglaf.lapin.org/' stripUrl = url + 'index.php?number=%s' firstStripUrl = stripUrl % '1'

imageSearch = '//div[d:class("comicpane")]//img'

imageSearch = '//div/div[1]/div[2]/div[2]/img'
prevSearch = '//a[@rel="«précédent "]'
help = 'Index format: n (unpadded)'`

Not sure about the use of class for the imageSearch parameter, but the way I wrote it seems to get me the correct image.

However, I don't manage to make the prevSearch parameter work, how can I do that?

Thanks for your help!

Also, not directly related, but I did try a full download of the original oglaf comic (supported by default), but the pictures are saved with their server names, any way to add a prefix on the filename during the import so I can sort them by release date?

kierun commented 1 month ago

As far as I know, the English version of Oglaf works perfectly as long as there is no multiple pages to the comic.

brunellejb commented 1 month ago

Yeah, downloading the English version works fine with dosage, only problem I have is I can't sort the pictures in release order, which doesn't help with the plots lasting over several images.

When I download everything from XKCD, for example, image filenames are like xxxx-name.png, where xxxx is the iterated comic number, so sorting is easy. But as this comic number is included in the original url (https://xkcd.com/xxxx/) it's retrievable with no problem.

For a comic like oglaf, where it's not available, I don't know what's the best way to make it work.

When downloading - - all, as images are downloaded backward from the latest one then going back in time using the "previous" link, there could be a first pass to count how many images exist, then on the second pass when actually downloading everything, the decreasing number could be added to the filename... But that doesn't feel great, and wouldn't work for partial downloads.

Other workaround could be, when downloading an image, to rewrite it's system's CreationTime and LastWriteTime to current, then for each new image downloaded in the same process going backwards, set subsequent CreationTime / LastWriteTime 1 minute before the previous one. That way it could be sorted by date in the explorer. But I don't know how easy/complicated that would be in python, and for both windows and Linux.