[ ] Ferret out any cross-registrations that may not be an exact match on prefix, but on other fields.
[ ] Assess the text-edit distance or other easy to do similarity match across all of the title fields.
[ ] Based on the above, for each prefix in the list, provide two pieces of data: 1) score indicating how likely the prefix in one row is to be related to a prefix in another row. (eg. corresponding to the same dataset, or to portions of that dataset). (eg. KEGG-disease, vs KEGG-protein). 2) what that corresponding prefix is to investigate
[ ] Re-arrange the list so that the related ones are clustered together for easier curation (?)