laurentprudhon / nlptextdoc

Suite of tools to extract and annotate language resources for NLP applications
Other
1 stars 2 forks source link

Unspecified exception in WebCrawler_PageCrawlCompletedAsync #11

Closed laurentprudhon closed 5 years ago

laurentprudhon commented 5 years ago

Frequent exceptions are thrown here - details lost :

at nlptextdoc.extract.html.WebsiteTextExtractor.WebCrawler_PageCrawlCompletedAsync(Object sender, PageCrawlCompletedArgs e)

Example websites to reproduce the bug :

http://www.lesclesdelabanque.com/ - after 51 pages http://bourse.latribune.fr/ - after 957 pages https://www.lesechos.fr/finance-marches/ - after 1617 pages http://www.lemonde.fr/epargne/ - after 2392 pages / 2536 pages / 4570 pages ...

laurentprudhon commented 5 years ago

Failed to reproduce : will be replaced by more useful issues with a detailed stack trace in further tests if necessary thanks to the new exceptions file.