laurentprudhon / nlptextdoc

Suite of tools to extract and annotate language resources for NLP applications
Other
1 stars 2 forks source link

Relative links resolution fails after redirect #40

Open laurentprudhon opened 5 years ago

laurentprudhon commented 5 years ago

Impossible to crawl the following website ;

http://www.fbf.fr/

Reason : all the links trigger a redirect, and the relative links in the page behind the redirect are resolved with the base Url of the original link, not the Url of the page after redirect.