Open ColtonBehannon opened 3 weeks ago
This is because for some urls, WebPageHelper
fails to process them. This could because it cannot fetch the url content or fails to parse the content.
If you don't want to see the error, you could add the following code to the script to run
import logging
logging.basicConfig(level=logging.CRITICAL)
However, it's generally suggested to log the error/warning. If you don't want to see them in the console output, you can write them to a file. See https://docs.python.org/3/library/logging.html for more info.
Describe the bug When running the Co-STORM example, I get numerous 403 errors output in the terminal. These errors are then followed by some trafilatura errors and errors complaining about 'The API deployment for this resource does not exist'.
Despite all this, the final report is seemingly output just fine. The only issue is the terminal is impossible to follow as a result of the errors.
This issue is similar to #133 where I also commented as I experienced similar results in the past with STORM. I have tried multiple networks, and this has not had an impact.
Are the 403 errors a result of these sites not allowing scraping and hence not included in the final report?
To Reproduce Report following things
Screenshots Error while requesting URL 403
followed by
_Trafilatura errors and 'An error occurred for text: root,' with 404 code_
Environment: