moov-io / watchman

AML/CTF/KYC/OFAC Search of global watchlist and sanctions
https://moov-io.github.io/watchman/
Apache License 2.0
331 stars 88 forks source link

OpenSanctions / FollowTheMoney search support #353

Open adamdecaf opened 3 years ago

adamdecaf commented 3 years ago

@pudo was hosting Happy Hour this week and mentioned that Watchman could support searching FollowTheMoney's schemas using OpenSanctions consolidated lists which would really increase the breadth of data available for searching.

Initially I'm thinking of an endpoint like the following:

GET /opensanctions/<dataset-name>/search?objects=<schema-list>&prop1=<string>&prop2=<string>

Each OpenSanctions datasource has the following property:

"name": "interpol_red_notices",

Source: https://data.opensanctions.org/datasets/latest/index.json

We can use that to download, parse, and index the list (assuming it's in FollowTheMoney's schemas) for search. Schemas: https://followthemoney.readthedocs.io/en/latest/model.html

A few examples:

GET /opensanctions/everypolitician/search?objects=person&surname=Bazar
GET /opensanctions/worldbank_debarred/search?objects=company,legalentity,organization&name=Acme+Corp

We can also initiate downloads for each list and offer endpoints to list available fields/schemas from lists.

POST /opensanctions/interpol_yellow_notices/download
GET /opensanctions/address/schema

{ 
        "label": "Address",
        "plural": "Addresses",
        "properties": {
          "city": {
            "description": "City, town, village or other locality",
            "label": "City",
            "name": "city",
            "qname": "Address:city",
            "type": "string"
          },
...
}
pudo commented 2 years ago

Hey @adamdecaf - sorry for the long radio silence on my end. I think I finally have OpenSanctions in a place where it could be used in watchman and produce great results. What we're offering now is this:

I'm curious if this useful yet, or if you have suggestions for further pre-processing or export formats we should consider.

adamdecaf commented 2 years ago

That's awesome @pudo! I'm OOO for a bit soon, but will look over what's been added here.

Do you think it makes sense to expose opensanctions in the Watchman URLs? I'm wondering if masking it as a data source makes sense or not.

pudo commented 2 years ago

A thing you might consider is to provide API support for OpenSanctions collections (i.e. generate various endpoints): I mentioned sanctions above, but there's also peps, crime and default (all of the other three) - these are different subsets of entities from different sources that a user might want to use for their checks.

adamdecaf commented 2 years ago

Just to clarify are you referring to endpoints like the following? We could include them in the root GET /search, but that endpoint has been growing and can suffer performance issues.

I like the idea of offering a pretty generic endpoint which offers Watchman's precompute, indexing, and search over the OS lists.

Do you see problems with this?

GET /opensanctions/<dataset-name>/search?objects=<schema-list>&prop1=<string>&prop2=<string>

objects=<schema-list> would be a comma separated list of schemas from OS.

pudo commented 2 years ago

That's really cool! This way people could easily pick and choose from the endpoints that are available. And picking the schema means being able to define a query scope. Nice! Most curious about how you find the data format to work with - I hope it's pretty easy but maybe you have ideas for additional export formats we should publish!