The Sunday Times is hosted on the TimesLIVE website, with https://sundaytimes.co.za redirecting to https://www.timeslive.co.za/sunday-times/.
All of the Sunday Times articles appear to have https://www.timeslive.co.za/sunday-times/ as the base URL. The articles can be found through the standard TimesLIVE sitemap. For example, the https://www.timeslive.co.za/sitemap/business/ sitemap entry has the following Sunday Times articles:
This pull request adds a spider for the Sunday Times as per the Sunday Times scraper card on the Public People Trello board.
The Sunday Times is hosted on the TimesLIVE website, with
https://sundaytimes.co.za
redirecting tohttps://www.timeslive.co.za/sunday-times/
.All of the Sunday Times articles appear to have
https://www.timeslive.co.za/sunday-times/
as the base URL. The articles can be found through the standard TimesLIVE sitemap. For example, thehttps://www.timeslive.co.za/sitemap/business/
sitemap entry has the following Sunday Times articles:http://www.timeslive.co.za/sunday-times/business/2020-04-04-markets-look-to-wall-street-for-recovery/
http://www.timeslive.co.za/sunday-times/business/2018-10-24-saa-post-office-and-other-ailing-soes-to-receive-billions-in-cash-bailouts/
http://www.timeslive.co.za/sunday-times/business/2018-09-29-maverick-musk-faces-us-regulators-wrath--for---go-private-tweet/
The
SundayTimesSpider
class is therefore implemented as a sub-class of theTimesliveSpider
class but only parsing paths that contain/sunday-times/
.This is my first time working on this project as well as with Scrapy, so please let me know if I have missed something.