research-software-directory / research-software-directory

The Research Software Directory is a content management system that is tailored to software.
https://research-software.nl
Apache License 2.0
47 stars 22 forks source link

Manual isSupplementTo GitHub links can lead to very long version strings in the citation element #797

Open jspaaks opened 3 years ago

jspaaks commented 3 years ago

The Research Software Directory's harvester assumes that Zenodo pages were created with the GitHub-Zenodo integration. If that's the case the harvester extracts the version tag from the isSupplementedBy metadata. It assumes that its value is formatted some thing like: https://github.com/citation-file-format/cff-converter-python/tree/1.3.3

However, there may be pages that have a manually entered isSupplementedBy value, for example https://github.com/citation-file-format/citation-file-format/releases/tag/1.2.0. This leads to very long version strings like so

image

, because the substring is not omitted when to code gets to the point where it derives a version tag here:

https://github.com/research-software-directory/research-software-directory/blob/a8212344b8a73a74dcc906fda008c5b2630179a4/harvesting/releases.py#L134

We should allow for a greater variety of github links, and additionally have a safeguard against very long version strings.

jspaaks commented 3 years ago

Also, what should happen if you have multiple isSupplementTo links to GitHub? Currently only the last isSupplemented link to GitHub is used if there are multiple.