tosdr / crawler.tosdr.org

ToS;DR Crawlers
11 stars 2 forks source link

Error crawling Letterboxd #3

Open Donnnno opened 3 years ago

Donnnno commented 3 years ago

I noticed that the crawler isn't able to crawl the letterboxd ToS and other policies anymore

The ToS: https://letterboxd.com/legal/terms-of-use/ https://letterboxd.com/legal/privacy-policy/ https://letterboxd.com/legal/community-policy/

The edit page: https://edit.tosdr.org/services/2074/annotate

& the Xpath: /html/body/div[1]/div/div/article

It seems that our crawler wasn't able to retrieve any text. Please check that the XPath and URL are accurate.

Reason: TimeoutError
Stacktrace: Waiting for element to be located By(xpath, /html/body/div[1]/div/div/article) Wait timed out after 10002ms 
JustinBack commented 3 years ago

Which crawler were you using?

That error message indicates the XPath is wrong as it cant find it in 10 seconds

JustinBack commented 3 years ago

I just checked the documents and it seems you'd have to go with a full page crawling. Not exactly why this is happening but I copied the exact XPath as well and the page does not seem to alter it

Donnnno commented 3 years ago

Which crawler were you using?

The default tosdr page grawler.

I just checked the documents and it seems you'd have to go with a full page crawling.

Thanks for the quick reply, I'll do that in the meantime. Thanks :)