Open ghost opened 3 years ago
Hi @JTJD thanks for your question. We have no current plans to add a mapping to EU PICs. Please feel free to share more information about your use case for having these identifiers available in ROR so that we can evaluate this for future development work
I would also be interested if ROR were to provide an organisation's PIC value.
I can describe my own particular use-case.
Through "Horizon 2020", the EU spent just shy of €80 billion over the past 7 years (2014 to 2020) to fund research and innovation. The next generation funding instrument will be "Horizon Europe" and will have a budget of over €95 billion.
CORDIS is the system that underpins the process of applying for funding and, if successful, for reporting on the project's activities. The EU has made all the data underpinning CORDIS available as open data, including details on all previously funded and ongoing projects. The license is very broad, allowing almost any reuse of this data.
There is a problem, though.
The EU identifies organisations with a Participant Identifier Code (PIC). This is a unique numerical ID that the EU assigns each organisation. This number, along with some metadata about the organisation (e.g., name, abbreviation, address, etc), is available through CORDIS' open-data. However (crucially) this metadata contains no link to any other identifier for the organisation. This makes it impossible to link the CORDIS corpus automatically with other corpora, unless the other data sources also use PIC values to identify organisations.
My own particular use-case involves linking CORDIS information with databases of scientific instruments, research groups (that benefit from EU funding) and people working within those research groups.
I would like to use ROR as common, unique identifiers for organisations and use the ROR metadata to further enhance information about the organisations.
I could do this manually, by creating the mapping from PIC to ROR ID for those organisations that matter for my use-case. However, I would imagine ROR supporting PIC would benefit others (including @JTJD, seemingly).
NB. There may be other corpora (from the EU or elsewhere) that use a PIC values to identify organisations, and might also benefit from ROR's support of PIC. My use-case is just an example of how this might be beneficial.
Just for some additional info. ORCID recently announced it would use ROR as an institutional identifier. I think this may significantly enhance the case for a PIC-ROR-PIC mapping.
With best regards
John
[cid:c7a7b641-b665-4ba4-88a8-bd8af07e0291]
Dr John Donovan / Dr Seán Ó Donnabháin
Head of Research and LEAR / Ceann Taighde agus IDEÚ
Technological University Dublin / Ollscoil Teicneolaíochta Baile Átha Cliath
– +35312205057 Mobile:/Gutháin Phoca + 353 87 9743137 – @.***https://tudublin.ie/
TU Dublin – Research, Enterprise & Innovation Services, 191 The North-Circular Road, D07 EWV4, Dublin 7, Ireland.
OT BÁC - Seirbhísí Taighde, Fiontar, agus Nuálaíochta, An Cuarbhóthar Thuaidh, 191, D07 EWV4, BÁC 7, Éire.
PIC/CAR: 903964729
Post-award toolboxhttps://bit.ly/31W1keG
TU Dublin is a registered charity RCN 20204754<tel:+35320204754>
[cid:67048e77-2774-422e-8f1f-27803ea1ebf5]
Ollscoil Teicneolaíochta na hEorpa
From: Paul Millar @.> Sent: Wednesday 1 December 2021 09:20 To: ror-community/ror-api @.> Cc: John Donovan @.>; Mention @.> Subject: Re: [ror-community/ror-api] European Union Participant Identifier Code (PIC) (#189)
[R.PHOST SEACHTRACH] NÁ CLICEÁIL AR naisc nó ceangaltáin ach amháin má aithníonn tú an seoltóir agus go bhfuil a fhios agat gur ábhar sábháilte é
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
I would also be interested if ROR were to provide an organisation's PIC value.
I can describe my own particular use-case.
Through "Horizon 2020https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fec.europa.eu%2Fprogrammes%2Fhorizon2020%2Fen%2Fwhat-horizon-2020&data=04%7C01%7C%7C82796a85ca5040addb4c08d9b4abc26e%7C766317cbe9484e5f8cecdabc8e2fd5da%7C0%7C0%7C637739472057249282%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5cpo4MrVUDsWcLd6hFXtTKvODaykoDe%2FYHr5nN%2BSAOM%3D&reserved=0", the EU spent just shy of €80 billion over the past 7 years (2014 to 2020) to fund research and innovation. The next generation funding instrument will be "Horizon Europehttps://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fec.europa.eu%2Finfo%2Fresearch-and-innovation%2Ffunding%2Ffunding-opportunities%2Ffunding-programmes-and-open-calls%2Fhorizon-europe_en&data=04%7C01%7C%7C82796a85ca5040addb4c08d9b4abc26e%7C766317cbe9484e5f8cecdabc8e2fd5da%7C0%7C0%7C637739472057259239%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=6bSg6kHNcJFsL8WydwYL4np%2BfxumrNax5gY3RDNarzk%3D&reserved=0" and will have a budget of over €95 billion.
CORDIShttps://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcordis.europa.eu%2F&data=04%7C01%7C%7C82796a85ca5040addb4c08d9b4abc26e%7C766317cbe9484e5f8cecdabc8e2fd5da%7C0%7C0%7C637739472057259239%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DofaL0bt%2FW0HB86u05SrjdPa9VUGFyWp6BXxsFePwkE%3D&reserved=0 is the system that underpins the process of applying for funding and, if successful, for reporting on the project's activities. The EU has made all the data underpinning CORDIS available as open datahttps://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata.europa.eu%2Fdata%2Fdatasets%2Fcordish2020projects%3Flocale%3Den&data=04%7C01%7C%7C82796a85ca5040addb4c08d9b4abc26e%7C766317cbe9484e5f8cecdabc8e2fd5da%7C0%7C0%7C637739472057269197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=DfjGCAtXu5P26VbzVgaqkL%2FSAFGsYYcqhklOz8MuhV0%3D&reserved=0, including details on all previously funded and ongoing projects. The license is very broad, allowing almost any reuse of this data.
There is a problem, though.
The EU identifies organisations with a Participant Identifier Code (PIC)https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fec.europa.eu%2Fresearch%2Fparticipants%2Fdocs%2Fh2020-funding-guide%2Fgrants%2Fapplying-for-funding%2Fregister-an-organisation%2Fregistration-of-organisation_en.htm&data=04%7C01%7C%7C82796a85ca5040addb4c08d9b4abc26e%7C766317cbe9484e5f8cecdabc8e2fd5da%7C0%7C0%7C637739472057269197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=gO2jWV7WcgcmZ9%2BRcJ6SZwQdvRsnnuX2kIjT4SK7BTY%3D&reserved=0. This is a unique numerical ID that the EU assigns each organisation. This number, along with some metadata about the organisation (e.g., name, abbreviation, address, etc), is available through CORDIS' open-data. However (crucially) this metadata contains no link to any other identifier for the organisation. This makes it impossible to link the CORDIS corpus automatically with other corpora, unless the other data sources also use PIC values to identify organisations.
My own particular use-case involves linking CORDIS information with databases of scientific instruments, research groups (that benefit from EU funding) and people working within those research groups.
I would like to use ROR as common, unique identifiers for organisations and use the ROR metadata to further enhance information about the organisations.
I could do this manually, by creating the mapping from PIC to ROR ID for those organisations that matter for my use-case. However, I would imagine ROR supporting PIC would benefit others (including @JTJDhttps://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FJTJD&data=04%7C01%7C%7C82796a85ca5040addb4c08d9b4abc26e%7C766317cbe9484e5f8cecdabc8e2fd5da%7C0%7C0%7C637739472057269197%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zHH5yG%2B%2BTTkZoJwVAw96W%2BY14BEDzKbFhVGmFNaxcQM%3D&reserved=0, seemingly).
NB. There may be other corpora (from the EU or elsewhere) that use a PIC values to identify organisations, and might also benefit from ROR's support of PIC. My use-case is just an example of how this might be beneficial.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fror-community%2Fror-api%2Fissues%2F189%23issuecomment-983444828&data=04%7C01%7C%7C82796a85ca5040addb4c08d9b4abc26e%7C766317cbe9484e5f8cecdabc8e2fd5da%7C0%7C0%7C637739472057279150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=GJYCA95n2G7l%2FEUQuvUpU9ReDfAxniL4qV1DUFaMWHQ%3D&reserved=0, or unsubscribehttps://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAVK6EGRHDDQRNHY6IDFHIALUOXSEHANCNFSM5CYSGFIQ&data=04%7C01%7C%7C82796a85ca5040addb4c08d9b4abc26e%7C766317cbe9484e5f8cecdabc8e2fd5da%7C0%7C0%7C637739472057279150%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=TF9MJdQkXHmbV0SsNiqDHCeQ8NupOvkq04gmW%2BcmWvU%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7C82796a85ca5040addb4c08d9b4abc26e%7C766317cbe9484e5f8cecdabc8e2fd5da%7C0%7C0%7C637739472057289103%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=kEZVcNFxM6wh5kc0Cl7jciODV2k%2BDBVZc2EYg6kGn3U%3D&reserved=0 or Androidhttps://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7C%7C82796a85ca5040addb4c08d9b4abc26e%7C766317cbe9484e5f8cecdabc8e2fd5da%7C0%7C0%7C637739472057299063%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=H729P%2BvfuB3GP6pltDrY6OWwfBP6HDzpjgCNCtYdsaA%3D&reserved=0.
This email originated from TU Dublin. If you received this email in error, please delete it from your system. Please note that if you are not the named addressee, disclosing, copying, distributing or taking any action based on the contents of this email or attachments is prohibited.
Is ó OT Baile Átha Cliath a tháinig an ríomhphost seo. Má fuair tú an ríomhphost seo trí earráid, scrios de do chóras é le do thoil. Tabhair ar aird, mura tú an seolaí ainmnithe, go bhfuil dianchosc ar aon nochtadh, aon chóipeáil, aon dáileadh nó ar aon ghníomh a dhéanfar bunaithe ar an ábhar atá sa ríomhphost nó sna hiatáin seo.
Is carthanas cláraithe í OT Baile Átha Cliath TU Dublin is a registered charity RCN 20204754
Here are some further observations.
Currently, CORDIS has just shy of 40,000 organisations, compared to over 100,000 in ROR.
With some digging, I found an automated way of linking CORDIS PIC IDs to ROR IDs. The CORDIS corpus includes the EU VAT number as metadata for ~85% of the organisations it describes. Wikidata has (for some organisations) both the ROR ID and the EU VAT number. Therefore, using Wikidata, it's possible to map some CORDIS organisation's PIC ID to their corresponding ROR ID.
As a proof-of-principle, I selected 42 organisations' EU VAT numbers from CORDIS and built a SPARQL query that tries to extract those organisation's ROR ID from Wikidata. That query yielded 16 ROR IDs: a little over one third. While that's far from perfect, it's better than starting from scratch (assuming this small test is representative).
For comparison, matching names (case-insensitive, but otherwise exact) and requiring exactly one match yielded little more than 1,500 links (~3%). A more flexible might yield more, but increases the risk of false matching.
In addition, both CORDIS and ROR include geographical coordinates for organisations. Any auto-generated PIC-to-ROR link could be validated using these coordinates; for example, by calculating the (great circle arc) distance between the two coordinates and reject the link if that distance is over 1 km (say).
Hi @mariagould
You mentioned "evaluate this for future development work".
May I ask about the process through which this request would be evaluated?
In particular, I was wondering on what timescale would something likely happen?
Cheers, Paul.
Hi @paulmillar thanks for your question. There are a number of considerations involved in changing the current data model. In terms of the mappings to other IDs there are technical considerations as well as policy ones (e.g., what criteria might be used to select the other ID types that ROR should map to, how should the mappings be prioritized, etc.). This is an area where additional consultation with users and community members will be useful, via existing channels such as our bimonthly community calls and asynchronous discussion forums. In terms of timescales, the priority for ROR development work right now is implementing the core infrastructure that is needed to support registry additions and updates. This needs to be up and running before we look at any changes to the data model. I would not expect any changes in the near term.
Thanks @mariagould for the explanation. That certainly makes sense. I look forward to the result of your consultation process.
In the mean time, I've created a proof-of-principle project (PIC-to-ROR) to generate a mapping from an organisation's PIC to the corresponding ROR identifier.
Currently, it uses the CORDIS data dump to discover a list of organisations and Wikidata to convert those with an EU VAT number to the corresponding ROR identifier. This approach is a "low-hanging fruit". I imagine adding other approaches in the future.
This is a humble beginning: of the 40,096 organisations in CORDIS, only 2,347 are mapped to their ROR identifier, a mere 5%; however, it's a starting point. I hope to improve this over time.
I've uploaded the command's output, so it's available for everyone who is interested in mapping EU PIC to ROR identifiers without having to run the code themselves. I will try to keep this file reasonably up-to-date, as time permits.
@paulmillar Wanted to make sure you saw that we now have a proposal that's open for comment on adding new external IDs to ROR, and PIC is a top contender for an early add. Take a look: https://ror.org/blog/2024-07-18-id-ideas/ -- comments open through August 16, 2024
Additionally requested by the Czech Science Foundation - https://ror.org/01pv73b02.
Hi @amandafrench,
Thanks for the "heads up". The proposal looks quite reasonable to me.
I've added a few comments to the document (even though this is strictly past the deadline). None of them are (in any sense) blocking, just some "friendly amendments".
Terrific! Thanks so much, @paulmillar!
Have you any plans to include the EU's Participant Identifier codes? Widely used around the world for applications and partnerships in European research and educational exchange programmes.