google / corpuscrawler

Crawler for linguistic corpora
Other
193 stars 55 forks source link

fixes bibleis crawler #45

Closed cash closed 5 years ago

cash commented 5 years ago

This closes issue #42. The website structure seems to have changed significantly from what was working with the old code. This PR changes it to pull everything from the embedded json object in the page.

brawer commented 5 years ago

Looks good to me, but I don’t work at Google anymore so I can’t merge this change... @sffc ?

cash commented 5 years ago

Thanks. I have a few other fixes for other crawlers that I haven't quite finished. Maybe early next week.