jackdbd / hokuto-no-ken-api

An API built by scraping the Hokuto Renkitōza wiki with Scrapy. Deployed on AWS as a Flask app with Zappa.
http://bit.ly/hokuto-api-dev
MIT License
0 stars 0 forks source link

Fix scraping of character pages #22

Open jackdbd opened 4 years ago

jackdbd commented 4 years ago

It seems that the scrapy spider that scrapes all the pages is no longer working.

The spider finishes in a few seconds, while before it took ~5 minutes. I'm afraid this means that the spider is no longer crawling the website.

The XPath selectors are probably still working though. The website look the same as the last time the characters spider was working, and this XPath selector in Chromium selects the expected text fields when used in the Rei's page.

$x('//table[@class="infobox"]//tr/th/text()')
jackdbd commented 4 years ago

The XPath selector $x('//table[@class="infobox"]//tr/th/text()') is still working fine, even when the popup is not dismissed.

Screenshot_2019-12-01_16-54-18