Closed Ketoch closed 6 months ago
I wonder if this is behind a crawl protecting CDN; if so we'd need to manually capture it down.
I don't know exactly, but it looks like it is so
I've managed to manually get the data with:
curl 'https://www.terorarananlar.pol.tr/ISAYWebPart/TArananlar/GetTerorleArananlarList' \
-X 'POST' \
-H 'Content-Length: 0' \
-H 'Content-Type: application/json'
However when trying to replicate the call with python requests I'm getting a UNSAFE_LEGACY_RENEGOTIATION_DISABLED
error
2024-03-15 22:07:01 [info ] Running dataset [tr_wanted] data_path=datasets/tr_wanted data_time=2024-03-15T22:07:00 dataset=tr_wanted
2024-03-15 22:07:02 [error ] HTTPSConnectionPool(host='www.terorarananlar.pol.tr', port=443): Max retries exceeded with url: /ISAYWebPart/TArananlar/GetTerorleArananlarList (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:992)'))) [tr_wanted] dataset=tr_wanted url=https://www.terorarananlar.pol.tr/ISAYWebPart/TArananlar/GetTerorleArananlarList
From https://github.com/urllib3/urllib3/issues/2653 I learned that apparently this is what happens with OpenSSL 3.0 when connecting to legacy websites that disable renegotiation without signalling it correctly.
Is saving a local copy of the data in the repo such as in lt_illegal_websites advisable or best to use a workaround such as https://github.com/urllib3/urllib3/issues/2653#issuecomment-1733417634?
yeah I think it's fine to enable the unsafe negotiation strategy, on the basis that we have another sanctions list that is http.
I'll notify them of the issue and ask that they look into upgrading.
Could you also add something like
if datetime.now > 2024-09-15:
context.log.warn("Check if the SSL renegotiation strategy is still needed")
in the crawl() function?
Data URL
https://en.terorarananlar.pol.tr/tarananlar
Publisher
The Ministry Of Interior
Publisher country/territory code
No response
Type of data
Crime/Wanted/Suspected (Persons suspected or convicted of crimes and listed by official law enforcement)
Coverage region
region:Global
Can you tell us more?
No response
This is a suggestion or request