Parse all individuals and ethereum addresses. Disregard other entities. For individual, gather when available: full name, passport number, date of birth. I would use either SDN_ADVANCED.XML or SDN.CSV. There probably exists tools to do so.
Find a nice way to format them into a sparses merkle trees.
In totem, we extracted the full name from the MRZ using zk-regex. It might be possible to use a lighter approach as the first name and last name are always separated by <<, and the first and middle names by <.
Here is how I think of matching:
The most qualitative level is passport number and ethereum addresses. They are both unambiguous. The ethereum address can be matched with user_identifier in the disclosure circuit, as it corresponds to the user's address in the onchain flow.
The next level is names and date of birth. Because of homonyms, it's not reliable to match just names (e.g. Mohamad Khalid is included and common). Fortunately, the list includes more than 7500 birth dates, and most often middle names. So all of them can be matched together.
We should be able to check if someone is in the OFAC sanctioned entity list.
How I would do it:
SDN_ADVANCED.XML
orSDN.CSV
. There probably exists tools to do so.In totem, we extracted the full name from the MRZ using zk-regex. It might be possible to use a lighter approach as the first name and last name are always separated by
<<
, and the first and middle names by<
.Here is how I think of matching:
user_identifier
in the disclosure circuit, as it corresponds to the user's address in the onchain flow.