opensanctions / opensanctions

An open database of international sanctions data, persons of interest and politically exposed persons
https://www.opensanctions.org
MIT License
497 stars 115 forks source link

Consider committing CSV data to github #4

Closed rufuspollock closed 6 years ago

rufuspollock commented 8 years ago

Its not too big after all ;-)

Makes it nice to track over time etc.

pudo commented 6 years ago

Now uploading to archive.org for permanent storage, which seems more appropriate: https://archive.org/details/opensanctions

rufuspollock commented 6 years ago

But do you get diffs and pull requests?

Also what about uploading to https://datahub.io 😄 - just install data tool from https://datahub.io/download and do data push ...

pudo commented 6 years ago

will upload to datahub ;) -- but I'm not quite sure of what the meaning of a PR or a diff against a sanctions list would be: it would have to come from the authority publishing the sanctions data, otherwise it's .... "just, like, your opinion, man"?

One thing we do need to do is start to create a link list between the lists (i.e. "the Saddam Hussein on this list is the same as the Saddam on that list") -- are you aware of any good tools we should look at for that?

rufuspollock commented 6 years ago

@pudo

will upload to datahub ;) -- but I'm not quite sure of what the meaning of a PR or a diff against a sanctions list would be: it would have to come from the authority publishing the sanctions data, otherwise it's .... "just, like, your opinion, man"?

What i mean is you'd at least be able to track changes over time via the diffs and can see errors -- i've found this super useful with the core data stuff

Of course, the changes will still be the maintainers opinion (at least until point that the authoritative body starts doing PRs :wink:)

Let me know when on datahub.io 😄

One thing we do need to do is start to create a link list between the lists (i.e. "the Saddam Hussein on this list is the same as the Saddam on that list") -- are you aware of any good tools we should look at for that?

Hmmm. Depends on tools vs patterns. And you may be more expert here - this is a classic "master data" plus match algorithm scenario as i understand it. If that is right we could then look for tools in that space.