Closed littauer closed 5 years ago
The "previous" link cannot be found with the regex in your post. That finds the URL of the comic, not the link to the previous day. The "previous" link is in the source with only the date (yyyy-mm-dd), in HTML as such (example from Barney and Google from April 21, 2019):
<slider-arrow inline-template :is-left-arrow="true" feature-slug="barney-google-and-snuffy-smith" date-slug="2019-04-20">
I hope this answers your question of how to find the link to the previous date's comic.
EDIT:
Is ComicsKingdom actually in the plugins folder at all?
Thanks for the "previous" link solution, I'd missed it and it will be helpful.
The big problem is that the image pointed to by the imageSearch does not have a Content-Length header and therefore gets written with zero length. It also doesn't have an assigned type but I can jam that to .png.
You're correct, ComicsKingdom isn't in the plugins or scripts folders; that's what I'm trying to fix.
EDIT: Doesn't prevSearch assume that it can return a URL? I can return a strip name and date from what you pointed me to but the base URL (https://comicskingdom.com/) isn't there to return.
I'm guessing that, based on that shortcoming, a pull request is needed - just about find and replace with regex.
The following pattern seems to work for immediate scraping but be aware that you can only go back about 7 days or so in the past. I'm using OnTheFastrack as an example.
DRAT! how do you paste python code here? The indents are being eaten!
class OnTheFastrack(_BasicScraper):
# King Features seems to have changed format on 4/09/2019
url = 'https://comicskingdom.com/on-the-fastrack/'
stripUrl = url + '%s'
firstStripUrl = stripUrl % '2000-11-13'
imageSearch = compile(r' image-url="(https://safr\.kingfeatures\.com/api/img\.php\?e=png&s=c&file=[^"]+)"')
prevSearch = compile(r' :is-left-arrow="true" .*date-slug="(\d\d\d\d-\d\d-\d\d)"')
help = 'Index format: yyyy-mm-dd'
def namer(self, image_url, page_url):
name = page_url.rsplit('/', 3)[2]
date = page_url.rsplit('/', 3)[3]
if date == "":
import datetime
date = datetime.date.today().strftime("%Y-%m-%d")
return "%s_%s.png" % (name.title(), date)
def link_modifier(self, url, tourl):
urllen = len(self.url)
if tourl[:urllen] != self.url:
tourl = self.url + tourl
return tourl
If you're desperate for ComicsKingdom.com strips, put comicskingdom.py in your dosagelib/plugins directory and reference the strips by ComicsKingdom/\<strip> as listed in it.
A few don't work (like Tiger vs TigerSundays) due to an issue in scripts/scraper.py.
I'll be asking about that in a separate issue.
You can find comicskingdom.py at:
https://github.com/littauer/dosagetest/blob/master/dosagelib/plugins/comicskingdom.py
Closed as fixed in pull request # 134
King Features (aka ComicsKingdom.com) has changed their format and I’m having trouble getting them set back up.
I don’t know python or the code base well enough to fix this but will do the grunt work if someone can point the way. I’d prefer a generic answer like those for Creators or GoComics but will take what I can get.
There aren’t a lot of comics and they’re mostly antiques (Rex Morgan MD, Judge Parker, The Phantom, Alley Oop) but there are a few that people here follow:
Sally Forth, Sherman’s Lagoon, Hagar the Horrible, Safe Havens, and On the Fastrack all are primarily there now. Kevin and Kell is probably headed that way as well.
The new format:
url is of the form https://comicskingdom.com/
prior strips (no more than a week or so back so far as I can tell) are at: https://comicskingdom.com//yyyy-mm-dd
index formats used to vary (mostly Month-dd-yyyy) but are now yyyy-mm-dd
I can’t find a “previous” link
imageSearch = compile(r' image-url="(https://safr\.kingfeatures\.com/api/img\.php\?e=png&s=c&file=.+)"')
gives 1 image.
There is no indication as to the image’s size or type even though Firefox correctly gets the size and type (PNG). This causes trouble on download.
Sample: safe-havens, on-the-fast-rack, sherman-s-lagoon, sally-forth, hagar-the-horrible, rex-morgan-md
Thanks for any help,
Tom