The Research Software Directory's harvester assumes that Zenodo pages were created with the GitHub-Zenodo integration. If that's the case the harvester extracts the version tag from the isSupplementedBy metadata. It assumes that its value is formatted some thing like: https://github.com/citation-file-format/cff-converter-python/tree/1.3.3
Also, what should happen if you have multiple isSupplementTo links to GitHub? Currently only the last isSupplemented link to GitHub is used if there are multiple.
The Research Software Directory's harvester assumes that Zenodo pages were created with the GitHub-Zenodo integration. If that's the case the harvester extracts the version tag from the
isSupplementedBy
metadata. It assumes that its value is formatted some thing like: https://github.com/citation-file-format/cff-converter-python/tree/1.3.3However, there may be pages that have a manually entered
isSupplementedBy
value, for example https://github.com/citation-file-format/citation-file-format/releases/tag/1.2.0. This leads to very long version strings like so, because the substring is not omitted when to code gets to the point where it derives a version tag here:
https://github.com/research-software-directory/research-software-directory/blob/a8212344b8a73a74dcc906fda008c5b2630179a4/harvesting/releases.py#L134
We should allow for a greater variety of github links, and additionally have a safeguard against very long version strings.