softwaresaved / habeas-corpus

A corpus of research software used in COVID-19 research.
MIT License
5 stars 4 forks source link

Provide a mapping between original mentions and unified mentions #6

Open sdruskat opened 3 years ago

sdruskat commented 3 years ago

We have to inherently create some sort of mapping between what the mentions originally looked like in CORD-19 (e.g., ['Statistical Package for Social Sciences (SPSS)', 'SPSS', 'SPSS Statistics'] and what they look like in a normalized fashion in our new dataset (e.g., SPSS).

It would probably be very useful for other projects that may reuse our dataset to also have access to the mapping. Therefore, it would be nice to provide this mapping in some consumable form.