opensanctions / crawler-planning

Task tracking for the crawlers we're working on
https://github.com/orgs/opensanctions/projects/2
5 stars 0 forks source link

Ukraine NAZK extra datasets #72

Closed pudo closed 6 months ago

pudo commented 7 months ago

It looks like the anti-corruption office of the Ukrainian government has released more entity lists, which are not strictly sanctions lists. I would like to make them into a separate dataset that we crawl, called ua_nazk_warnings (while the main bit is still in ua_nazk_sanctions). This should then not be added to the sanctions collection, but to default.

These are the API endpoints:

The API is documented here: https://sanctions.nazk.gov.ua/api/

This dataset is multi-lingual, let's make use of entity.add(prop, val, lang='ukr') :)

pudo commented 7 months ago

This was addressed in opensanctions/opensanctions#566 right?

jbothma commented 7 months ago

Nope - https://github.com/opensanctions/opensanctions/pull/566 used new company and person endpoints designed specifically for entities related to sanctioned entities.

So it does not add entities from the endpoints listed here.

jbothma commented 7 months ago

further reading

@pudo no sanction entity, no topic, just a relevant one liner in entity:program?

pudo commented 7 months ago

I think in the main crawler we're putting topic poi, which seems ok (means nothing in particular)

jbothma commented 7 months ago

@fjuniorr have you made a start on this one? I see you've moved it to in progress but don't see a PR.

I think this one is ready to go ahead and add as a separate dataset from that fixed in #566.

pudo commented 6 months ago

This data portal has been shut down. I'm fixing the crawler to run off the latest available dump of the data, but adding more endpoints will not be possible now.