laurentprudhon / nlptextdoc

Suite of tools to extract and annotate language resources for NLP applications
Other
1 stars 2 forks source link

Http return code 301 "Moved perrmanently" and 303 "See Other" not handled properly #4

Closed laurentprudhon closed 5 years ago

laurentprudhon commented 5 years ago

Here are 3 examples of requests which are registered as "BadRequest" during the crawl, while they are handled properly as a redirect in a browser :

https://www.cbanque.com/forums/posts/306443/ 301 / Moved Permanently Location: https://www.cbanque.com/forums/fil/obtention-dun-pret-cel.36080/#post-306443

https://www.cbanque.com/forums/fil/offre-de-pret-renvoyee-et-appel-de-fonds.36043/latest 303 / See Other Location: https://www.cbanque.com/forums/fil/offre-de-pret-renvoyee-et-appel-de-fonds.36043/page-2#post-305439

https://www.creditmutuel.fr/fr/particuliers/simulations-souscriptions/emprunter.html 301 / Moved Permanently https://www.creditmutuel.fr/fr/particuliers/simulations-souscriptions/index.html#I2

Here is a real example of http return code 400 "Bad Request" : https://www.lesclesdelabanque.com/Web/Cdb/Particuliers/Content.nsf/LexiqueByTitleWeb/opposition%20ch%C3%A8que%20/%20ch%C3%A9quier%20par%20le%20client

laurentprudhon commented 5 years ago

Here are two requests properly registered as "Moved" in the logs :

https://www.lesclesdelabanque.com/Web/Cles/Content.nsf/DocumentsByIDWeb/6WGJ2Y?OpenDocument 301

https://www.lesechos.fr/partenaire/france-qualite 301

laurentprudhon commented 5 years ago

Fixed