laurentprudhon / nlptextdoc

Suite of tools to extract and annotate language resources for NLP applications
Other
1 stars 2 forks source link

Url not encoded properly while crawling french website #3

Closed laurentprudhon closed 5 years ago

laurentprudhon commented 5 years ago

The link below : https://www.lesclesdelabanque.com/Web/Cdb/Particuliers/Content.nsf/LexiqueByTitleWeb/ch%C3%A8que%20non%20barr%C3%A9%20 found on page : https://www.lesclesdelabanque.com/Web/Cdb/Particuliers/Content.nsf/DocumentsByIDWeb/6WED9F?OpenDocument contains Url-encoded characters. Abot sends the requests and receives a NotFound Http Error. But the same Url manually entered in a browser works well. => need to fix this encoding problem

laurentprudhon commented 5 years ago

Fixed