Open svandenhoek opened 3 years ago
When fixing, ensure tests are added on one of the failing HPO-terms (such as HP:0002664
).
DisGeNET will evaluate how to remove incorrect entries (that is, gene symbols that aren't HGNC symbols but do have an http://identifiers.org/hgnc.symbol/ IRI in DisGeNET). A hotfix will be implemented so that the current version of vibe with updated database will function, though this does mean that for now shown gene symbols aren't always HGNC approved symbols. When a new DisGeNET database release is available, a more long-term fix should be implemented based on the new RDF design (depending on how non-official gene symbols are stored within DisGeNET at that point).
Non-valid HGNC symbols seem to already be present in previous releases as well. For example: http://identifiers.org/hgnc.symbol/LOC105375655
in DisGeNET v6.0.0 geneSymbol.ttl. It is therefore very likely every VIBE release is affected by this bug, yet did not throw an error because previous non-valic HGNC symbols did seem to adhere to the validation regex requirement.
Hotfix implemented in #85 for 5.1 branch & #86 should merge this back into master, though the core of this issue requires an updated database.
Describe the bug The new VIBE v5.1 database contains gene symbols which are assumed to be HGNC gene symbols, but this is not always the case. This issue seems to also be present in the source dataset where these symbols are described as if they are HGNC symbols (f.e.
http://identifiers.org/hgnc.symbol/MCS+9.7
where therdfs:comment
states it is a HGNC Gene Symbol).To Reproduce
Expected behavior Results are shown while data does not contain invalid HGNC symbols/these are not explicitly marked as HGNC symbols.