opensanctions / crawler-planning

Task tracking for the crawlers we're working on
https://github.com/orgs/opensanctions/projects/2
6 stars 0 forks source link

Georgia asset declarations #230

Closed pudo closed 2 months ago

pudo commented 2 months ago

There's an API of some sort available here: https://declaration.acb.gov.ge/Home/ApiInstruction

Can we use this to generate current-year/last-year position holders and import them as PEPs?

pudo commented 2 months ago

cc @ketoch

jbothma commented 2 months ago

The server seems to get overloaded quite easily and I don't see any pagination options in the API docs.

Perhaps we can try looping over organization ID and year combinations:

https://declaration.acb.gov.ge/Api/Declarations?OrganizationIds=1&YearSelectedValues=2021

But if that gets stuck (it looks like the site times out after 5m) we could just crawl the web pages.

I'll email to ask for pagination options and update here if available.

It's worth double-checking that the entities and positions in scope for reporting here match our PEP position criteria sufficiently well - in the ideal case we can assume all positions are PEPs, but that isn't always the case. 2024-07-18_16-43

pudo commented 2 months ago

Let's try and crawl the web pages, then, perhaps? We only need the latest year from the API, but it does seem remarkably unstable even for that.

jbothma commented 2 months ago

gpt pdfs maybe

jbothma commented 2 months ago

Added in https://github.com/opensanctions/opensanctions/pull/1072 - will close this if it all looks good