tosdr / edit.tosdr.org

👍👎 A new web app to rate services
https://edit.tosdr.org
GNU Affero General Public License v3.0
213 stars 37 forks source link

Crawling Errors #962

Closed JustinBack closed 3 years ago

JustinBack commented 3 years ago

As of a couple days, since we are switching to a new crawler, the crawling ends up in a server error

tosdrbot commented 3 years ago

This issue has been mentioned on ToS;DR Reviewers Forum. There might be relevant details there:

https://forum.tosdr.org/t/crawling-errors-crawling-update/407/3

michielbdejong commented 3 years ago

This only happens on some docs. This still the old crawler. I didn't switch yet because the difference between the current format (slightly-stripped html) is too different from the format that the new crawler uses (markdown). I'm thinking a good next step would be to adapt the new crawler so it outputs something that's closer to what we have now

I did re-crawl all docs with the current (old) crawler, see https://forum.tosdr.org/t/recrawled-all-documents/415

tosdrbot commented 3 years ago

This issue has been mentioned on ToS;DR Reviewers Forum. There might be relevant details there:

https://forum.tosdr.org/t/crawling-errors-crawling-update/407/5

JustinBack commented 3 years ago

Crawling https://edit.tosdr.org/documents/551/ fails, seems that the crawler is being blocked by sprint servers?

https://edit.tosdr.org/services/1336/annotate

tosdrbot commented 3 years ago

This issue has been mentioned on ToS;DR Reviewers Forum. There might be relevant details there:

https://forum.tosdr.org/t/nebula-privacy-policy-and-tos-not-showing-up/454/2