ror-community / ror-roadmap

Central information about what is happening at ROR and how to contribute feedback
10 stars 2 forks source link

[FEATURE] Add Unique Entity ID (UEI) tags as external IDs #203

Open Marshlight opened 1 year ago

Marshlight commented 1 year ago

Describe the problem you would like to solve Domestic and international organizations that receive funding from the US federal government through SAM.gov must have a UEI (https://sam.gov/content/duns-uei). It would be helpful to link UEI to ROR, as some USG funders are moving toward using UEI to disambiguate awardee institutions, and interoperability with ROR would (hopefully) increase ease of ROR adoption for other data calls.

Describe the solution you'd like Add UEI to ROR external IDs.

Who would benefit from this feature? Government funders trying to track the output of organizations with UEI; those with UEI looking for their ROR (this is admittedly niche)

Additional information I am using the Edugain ID ticket (https://github.com/ror-community/ror-roadmap/issues/146) as reference for this one. I have done some UEI to GRID ID work already, and I'm sure there are some followup questions I'll need to answer.

amandafrench commented 1 year ago

@Marshlight Thanks so much for submitting this! Do you think it's possible to publish at least a UEI to ROR mapping in spreadsheet form? If so, we'd be happy to include that in our user documentation ASAP.

amandafrench commented 1 year ago

Or, I should add, the UEI to GRID mapping -- that would be easy to add ROR to, since the ROR dataset is already natively mapped to GRID.

Marshlight commented 1 year ago

@amandafrench we are working on cleaning up a UEI to GRID map for you, but are running into some data QA problems - multiple UEIs per GRID. This is probably one of those things that will need continued curation and I can't guarantee it's going to be complete or reliable...but we can send what we have, soon!

amandafrench commented 1 year ago

@Marshlight Ah, interesting. But yes, I think many others would be interested in even a rough version of this mapping. Thanks for working on it!

Marshlight commented 1 year ago

crosslinked-institution-identifiers.csv Ok, here is a rough draft with institution names, GRID ID, UEI (from SAM.gov), CAGE (a DOD specific ID), and DUNS (old and not complete but it's there anyway). You'll notice several duplicate lines, most notably for West Virginia University which has 10 rows for some reason. I have not tried to simplify this yet in any way yet, but I know it needs to be done. I probably won't get to it in the short term, but happy to further discuss QA.

poworoznek commented 1 year ago

GRID/ROR to UEI will probably not end up being exactly 1:1 due to differences in the taxonomy but may get close(r). GRID to CAGE will always be 1:many for many institutions.

amandafrench commented 1 year ago

@Marshlight Sorry, I'm getting a "Not Found" message when I click on the link to the csv - you can email it to support at ror dot org if you like

Marshlight commented 1 year ago

crosslinked-institution-identifiers.csv Does this work instead? Weird!

amandafrench commented 1 year ago

@Marshlight Yes, that worked! Thanks!

amandafrench commented 4 months ago

@Marshlight @poworoznek We've recently issued a call for comment on adding new external IDs to ROR, and UEI is a top candidate to pilot with. Can you take a look and comment before August 16th? Google Doc for comments / suggestions is linked from the announcement: https://ror.org/blog/2024-07-18-id-ideas/

Marshlight commented 3 months ago

@amandafrench incredibly excited about this! I don't have specific feedback on the announcement text itself, but I'm gathering that the UEI-ROR lookup needs to be open source and maintained by us or someone else to be included, right? Or can ROR do some of the matching QA/QC? The folks behind the DTIC tickets (https://github.com/ror-community/ror-updates/issues/6112, https://github.com/ror-community/ror-updates/issues/6478, https://github.com/ror-community/ror-updates/issues/6649) are likely to launch their own UEI-ROR mapping via an updated org authority file later in the summer/fall, and they probably will have put more time in by the time they're done. I don't think I can tag them here, but let me know when we get closer to UEI-ROR matching being a reality, and I can do it via email.

adambuttrick commented 3 months ago

@Marshlight In order to maintain this mapping in ROR, we would need to be able to derive the UEI values from their source, with the data being available in an openly licensed form. I did a cursory review of the SAM.gov data bank files and did not see these as being immediately available, so any details you can provide about how we could access and under what terms would be appreciated. I work directly with the DTIC staff on the issues you tagged, so I can follow up with them as well.

Marshlight commented 3 months ago

@adambuttrick hmm I have run into this issue as well. I am pretty sure https://sam.gov/data-services/Entity%20Registration?privacy=Public ought to do it, but I remember having some difficulty. Although I am looking at Public V2/SAM_PUBLIC_MONTHLY_V2_20240602.dat right now, where the first column is UEI and the fourth is CAGE; a very small amount of manual spot checking matches what I posted in this thread before. So maybe they fixed it from the last time I tried to do this!

adambuttrick commented 3 months ago

@Marshlight Thanks for flagging this! I will review and follow up if I have any questions.

poworoznek commented 3 months ago

@adambuttrick We used 5 of the .dat files concatenated together to cover the time period 2020_NOV to present (at the time, there was a discontinuity due to SAM.gov file system changes), as they need to be concatenated and deduplicated to be comprehensive. Each .dat file only covers the specified time frame. The data dictionary is buried in the SAM.gov support pages. I included the field names below for the .dat files. https://www.usaspending.gov/recipient also works for UEI <-> DUNS.

['UNIQUE ENTITY IDENTIFIER (SAM)', 'UNIQUE ENTITY IDENTIFIER (DUNS)', 'ENTITY EFT INDICATOR', 'CAGE CODE', 'DODAAC', 'SAM EXTRACT CODE', 'PURPOSE OF REGISTRATION', 'INITIAL REGISTRATION DATE', 'REGISTRATION EXPIRATION DATE', 'LAST UPDATE DATE', 'ACTIVATION DATE', 'LEGAL BUSINESS NAME', 'DBA NAME', 'ENTITY DIVISION', 'ENTITY DIVISION NUMBER', 'PHYSICAL ADDRESS LINE 1', 'PHYSICAL ADDRESS LINE 2', 'PHYSICAL ADDRESS CITY', 'PHYSICAL ADDRESS PROVINCE OR STATE', 'PHYSICAL ADDRESS ZIP/POSTAL CODE', 'PHYSICAL ADDRESS ZIP CODE +4', 'PHYSICAL ADDRESS COUNTRY CODE', 'PHYSICAL ADDRESS CONGRESSIONAL DISTRICT', 'D&B OPEN DATA FLAG', 'ENTITY START DATE', 'FISCAL YEAR END CLOSE DATE', 'ENTITY URL', 'ENTITY STRUCTURE', 'STATE OF INCORPORATION', 'COUNTRY OF INCORPORATION', 'BUSINESS TYPE COUNTER', 'BUSINESS TYPE STRING', 'PRIMARY NAICS', 'NAICS CODE COUNTER', 'NAICS CODE STRING', 'PSC CODE COUNTER', 'PSC CODE STRING', 'CREDIT CARD USAGE', 'CORRESPONDENCE FLAG', 'MAILING ADDRESS LINE 1', 'MAILING ADDRESS LINE 2', 'MAILING ADDRESS CITY', 'MAILING ADDRESS ZIP/POSTAL CODE', 'MAILING ADDRESS ZIP CODE +4', 'MAILING ADDRESS COUNTRY', 'MAILING ADDRESS STATE OR PROVINCE', 'GOVT BUS POC FIRST NAME', 'GOVT BUS POC MIDDLE INITIAL', 'GOVT BUS POC LAST NAME', 'GOVT BUS POC TITLE', 'GOVT BUS POC ST ADD 1', 'GOVT BUS POC ST ADD 2', 'GOVT BUS POC CITY', 'GOVT BUS POC ZIP/POSTAL CODE', 'GOVT BUS POC ZIP CODE +4', 'GOVT BUS POC COUNTRY CODE', 'GOVT BUS POC STATE OR PROVINCE', 'ALT GOVT BUS POC FIRST NAME', 'ALT GOVT BUS POC MIDDLE INITIAL', 'ALT GOVT BUS POC LAST NAME', 'ALT GOVT BUS POC TITLE', 'ALT GOVT BUS POC ST ADD 1', 'ALT GOVT BUS POC ST ADD 2', 'ALT GOVT BUS POC CITY', 'ALT GOVT BUS POC ZIP/POSTAL CODE', 'ALT GOVT BUS POC ZIP CODE +4', 'ALT GOVT BUS POC COUNTRY CODE', 'ALT GOVT BUS POC STATE OR PROVINCE', 'PAST PERF POC POC FIRST NAME', 'PAST PERF POC POC MIDDLE INITIAL', 'PAST PERF POC POC LAST NAME', 'PAST PERF POC POC TITLE', 'PAST PERF POC ST ADD 1', 'PAST PERF POC ST ADD 2', 'PAST PERF POC CITY', 'PAST PERF POC ZIP/POSTAL CODE', 'PAST PERF POC ZIP CODE +4', 'PAST PERF POC COUNTRY CODE', 'PAST PERF POC STATE OR PROVINCE', 'ALT PAST PERF POC FIRST NAME', 'ALT PAST PERF POC MIDDLE INITIAL', 'ALT PAST PERF POC LAST NAME', 'ALT PAST PERF POC TITLE', 'ALT PAST PERF POC ST ADD 1', 'ALT PAST PERF POC ST ADD 2', 'ALT PAST PERF POC CITY', 'ALT PAST PERF POC ZIP/POSTAL CODE', 'ALT PAST PERF POC ZIP CODE +4', 'ALT PAST PERF POC COUNTRY CODE', 'ALT PAST PERF POC STATE OR PROVINCE', 'ELEC BUS POC FIRST NAME', 'ELEC BUS POC MIDDLE INITIAL', 'ELEC BUS POC LAST NAME', 'ELEC BUS POC TITLE', 'ELEC BUS POC ST ADD 1', 'ELEC BUS POC ST ADD 2', 'ELEC BUS POC CITY', 'ELEC BUS POC ZIP/POSTAL CODE', 'ELEC BUS POC ZIP CODE +4', 'ELEC BUS POC COUNTRY CODE', 'ELEC BUS POC STATE OR PROVINCE', 'ALT ELEC POC BUS POC FIRST NAME', 'ALT ELEC POC BUS POC MIDDLE INITIAL', 'ALT ELEC POC BUS POC LAST NAME', 'ALT ELEC POC BUS POC TITLE', 'ALT ELEC POC BUS ST ADD 1', 'ALT ELEC POC BUS ST ADD 2', 'ALT ELEC POC BUS CITY', 'ALT ELEC POC BUS ZIP/POSTAL CODE', 'ALT ELEC POC BUS ZIP CODE +4', 'ALT ELEC POC BUS COUNTRY CODE', 'ALT ELEC POC BUS STATE OR PROVINCE', 'NAICS EXCEPTION COUNTER', 'NAICS EXCEPTION STRING', 'DEBT SUBJECT TO OFFSET FLAG', 'EXCLUSION STATUS FLAG', 'SBA BUSINESS TYPES COUNTER', 'SBA BUSINESS TYPES STRING', 'NO PUBLIC DISPLAY FLAG', 'DISASTER RESPONSE COUNTER', 'DISASTER RESPONSE STRING', 'FLEX FIELD 1', 'FLEX FIELD 2', 'FLEX FIELD 3', 'FLEX FIELD 4', 'FLEX FIELD 5', 'FLEX FIELD 6', 'FLEX FIELD 7', 'FLEX FIELD 8', 'FLEX FIELD 9', 'FLEX FIELD 10', 'FLEX FIELD 11', 'FLEX FIELD 12', 'FLEX FIELD 13', 'FLEX FIELD 14', 'FLEX FIELD 15', 'FLEX FIELD 16', 'FLEX FIELD 17', 'FLEX FIELD 18', 'FLEX FIELD 19', 'FLEX FIELD 20', 'END OF RECORD INDICATOR']