We have access to another large chunk of data from Web of Science via another archive on a physical disk. We should make sure that our existing importer will work with this new data and then get the data imported into the production database.
There are a couple of other improvements that we should make before we do this. Specifically:
We may want to try to do better matching on the titles of publications from Web of Science as they're being imported in order to better match grant metadata to existing publications: https://github.com/psu-stewardship/researcher-metadata/blob/2bb86fa6a8ec283155ce80a9f99840904fa6885b/app/models/publication.rb#L82OR we might want to rethink how WoS metadata is being imported altogether. It might be better to abandon the attempt to automatically match the titles upon import altogether and instead just import possible duplicates for humans to resolve definitively. In that case, we'd want to consider changing our duplicate merge process so that it also automatically merges grant metadata that is associated with publications.
We have access to another large chunk of data from Web of Science via another archive on a physical disk. We should make sure that our existing importer will work with this new data and then get the data imported into the production database.
There are a couple of other improvements that we should make before we do this. Specifically: