google / corpuscrawler

Crawler for linguistic corpora
Other
190 stars 56 forks source link

Skip urls with non-200 http status #51

Closed blackblitz closed 4 years ago

blackblitz commented 4 years ago

This pull request is opened in response to https://github.com/google/corpuscrawler/issues/50#issuecomment-542928663. Instead of crashing the program if the http status is not 200, the program will just skip the url.

googlebot commented 4 years ago

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

blackblitz commented 4 years ago

@googlebot I signed it!

googlebot commented 4 years ago

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.