Open 16arpi opened 1 year ago
These urls are concerned by errors during webmining.
error
max-redirects
infinite-redirects
read-timeout
unknown-host
invalid-redirect
connection-aborted
invalid-url
connection-refused
self-redirect
connect-timeout
ssl
no-route-to-host
connection-error
invalid-gzip
extract_error
invalid-status
no-result
errored
invalid-mimetype
file-not-found
trafilatura-error
some invalid redirects now handled through stateful redirection
https://www.ippmedia.com:/en/news/govt-takes-steps-ensure-availability-fertiliser-farmers redirects to somewhat illegal url with empty port
These urls are concerned by errors during webmining.
URLs extracted from
error
max-redirects
infinite-redirects
read-timeout
unknown-host
invalid-redirect
connection-aborted
invalid-url
connection-refused
self-redirect
connect-timeout
ssl
no-route-to-host
connection-error
invalid-gzip
URLs extracted from
extract_error
invalid-status
no-result
errored
invalid-mimetype
file-not-found
trafilatura-error