opensanctions / crawler-planning

Task tracking for the crawlers we're working on
https://github.com/orgs/opensanctions/projects/2
5 stars 0 forks source link

Czech Republic National Sanctions List #158

Closed bgmello closed 1 month ago

bgmello commented 3 months ago

Data URL

https://mzv.gov.cz/jnp/cz/zahranicni_vztahy/sankcni_politika/sankcni_seznam_cr/vnitrostatni_sankcni_seznam.html

Publisher

https://mzv.gov.cz/jnp/cz/index.html

Publisher country/territory code

cz

Type of data

Sanctions (Governments barring specific engagement e.g. financial transaction with these entities)

Coverage region

region:Europe

Can you tell us more?

According to Act No. 1/2023 Coll. on restrictive measures against certain serious acts applied in international relations, the Ministry of Foreign Affairs maintains a national sanctions list and publishes it on its website.

This is a suggestion or request

jbothma commented 3 months ago

looks like it's worth getting the CSV or Excel link from the HTML if the link changes but the landing page URL stays the same

dchaplinsky commented 3 months ago

Here is the link to the CSV: https://mzv.gov.cz/file/5296648/Vnitrostatni_sankcni_seznam.csv

Let me know if you want me to write an importer for the CSV and the code to find CSV link on the page (I'm not sure this is a good idea, as it seems that each addendum to the sanctions list will have its own page and csv file)

@jbothma please advise

jbothma commented 3 months ago

so it looks like https://mzv.gov.cz/jnp/cz/o_ministerstvu/otevrena_data/vnitrostatni_sankcni_seznam/index.html which is the csv link from https://mzv.gov.cz/jnp/cz/zahranicni_vztahy/sankcni_politika/sankcni_seznam_cr/vnitrostatni_sankcni_seznam.html was created in august and updated in december, while the file was updated in march. so I think the safest is to write a crawler that finds the file link at https://mzv.gov.cz/jnp/cz/o_ministerstvu/otevrena_data/vnitrostatni_sankcni_seznam/index.html, downloads the file, then extracts the entities from that CSV.

The earliest sanction was last year April, so there's a chance the page linking to it was different - hopefully they delete the old page if they create a new one then we'll find out.

dchaplinsky commented 3 months ago

Okay, taking this one.