Closed gouttegd closed 1 year ago
This is actually not quite right in general, as it does not catch cases like PRO, or OBA, correctly, where (\d+) is not enough - it should be ([A-Z0-9]+) instead.
The pattern is correct for what the OBO specification calls Canonical Prefixed ID, where only digits are allowed in the “local” part.
The pattern will indeed not recognise Non-Canonical Prefixed ID, where almost everything is allowed apart from whitespaces (not just [A-Z0-9]+
). I assume this is what is intended, so will leave that pattern as it is.
@gouttegd is right about the official rules. In practise, these are the exceptions to the OBO numeric ID policy that the PURL system is aware of: PR's adopted UniProt IDs, NCIT's C\d+ concept IDs, RO_HOM\d+, and GNO. I don't know the OBA case the @matentzn mentioned. The PR and NCIT are significant enough that you might consider making exceptions for them.
https://github.com/OBOFoundry/purl.obolibrary.org/blob/master/config/obo.yml#L110
The
OboUtilities.isOboId
method inorg.protege.editor.owl.model.util
is currently using thePattern.find
method to check whether its argument is an OBO ID.This is inappropriate as the
Pattern.find
method looks for a subsequence within the string, instead of attempting to match the full string. As a result, a bogus OBO ID such as "MONDO:0018299-obsoleted" will be incorrectly identified as a valid OBO ID because thePattern.find
method will simply stop matching at the first character that does not fit the pattern (the '-'), but will still report that it has find a match.Here, we need to check that the entire string matches the OBO ID pattern, which is why this commit replaces the use of
Pattern.find
byPattern.match
, which does precisely that.This is also what the
getOboLibraryIriFromOboId
method, in the same class, is doing. For consistency,isOboId
should do the same, so that whenisOboId
returns true, the calling code can then invokegetOboLibraryIriFromOboId
on the same string and be confident that the call would not throw a "Invalid OBO Id" RuntimeException.