google / corpuscrawler

Crawler for linguistic corpora
Other
190 stars 56 forks source link

what sites are crawled? #43

Closed thebucketmouse closed 5 years ago

thebucketmouse commented 5 years ago

I looked through the readme; is there a list of what sites are crawled by this script? Is there documentation for how to add additional sites?

brawer commented 5 years ago

Have a look at the source code and follow the examples. They typically use utility functions from here.

sffc commented 5 years ago

Closing as there is no further action required on this issue. Feel free to submit a PR updating the documentation.