google / corpuscrawler

Crawler for linguistic corpora
Other
190 stars 56 forks source link

[cy] add basic Welsh crawler #28

Closed cwd24 closed 6 years ago

cwd24 commented 6 years ago

Tested with the following local change, to reduce redirects when run from the UK:

diff --git a/Lib/corpuscrawler/util.py b/Lib/corpuscrawler/util.py index f22e6b9..e6b294c 100644 --- a/Lib/corpuscrawler/util.py +++ b/Lib/corpuscrawler/util.py @@ -316,7 +316,7 @@ class Crawler(object):

using the same site structure for all languages.

def crawl_bbc_news(crawler, out, urlprefix):

sitemap = {'http://www.bbc.com/burmese/world-41146701': None}

brawer commented 6 years ago

Thank you!