Closed ValWood closed 1 week ago
I can't see anything that's changed. We get the MGI IDs from the GOA GAF and they look like "MGI:MGI:1919005".
It's a bit weird. We have "MGI:MGI:1919005" in the GAF file but the error is:
1919005 does not match any id_syntax patterns for MGI
It's like the "MGI:" prefix is being removed twice.
The id_syntax
in the db-xrefs file is "MGI:[0-9]{5,}" and that has been the same for a few years.
I think this might be a problem with the GO check. The issue for the check is still in progress:
@pgaudet @kltm is this a GO checks problem? v
maybe @kltm knows why this is happening
Looking at https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000027.md . Okay, "soft" warning, so no data filtering.
The moment of failure is likely here: https://github.com/biolink/ontobio/blob/master/ontobio/io/assocparser.py#L835 Special casing for MGI leading into it is: https://github.com/biolink/ontobio/blob/master/ontobio/io/assocparser.py#L802-L806
So, it looks like MGI:MGI:1919005
would be clipped to MGI
and 1919005
, the latter of which would fail when checking against the regexp. The options here would be:
MGI:MGI:MGI:1919005
(I know what the knock-on effect would be: hilarity)Either way, @pgaudet , this is probably best approached as a GO QC bug for the moment (although a "light" one as no fix or filtering is done) and added to the QC worklist.
@pgaudet @kltm I'm closign this on the PomBAse tracker. It should be in the GO tracker if it's still an issue?
Our MGI ISO xrefs are failing checks.
WARNING - Invalid identifier:GORULE:0000027: 1298204 does not match any id_syntax patterns for MGI in dbxrefs--
PomBase SPBC530.12c pdf1 enables GO:0008474 PMID:15075260 ISO MGI:MGI:1298204 F palmitoyl protein thioesterase/ dolichol pyrophosphate phosphatase fusion protein Pdf1 protein taxon:4896 20040414 PomBase
WARNING - Invalid identifier:GORULE:0000027: 1316717 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPBC20F10.03 SPBC20F10.03 is_active_in GO:0005634 GO_REF:0000024 ISS MGI:MGI:1316717 C armadillo-type fold protein, human IFRD1 ortholog, implicated in transcription or signaling protein taxon:4896 20170830 PomBase
WARNING - Invalid identifier:GORULE:0000027: 1346084 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC6C3.09 rpp40 part_of GO:0005655 GO_REF:0000024 ISS MGI:MGI:1346084 C RNase P and RNase MRP subunit Rpp40 protein taxon:4896 20061017 PomBase
WARNING - Invalid identifier:GORULE:0000027: 1919005 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC513.06c dhd1 involved_in GO:0042843 GO_REF:0000024 ISS MGI:MGI:1919005 P D-xylose 1-dehydrogenase (NADP+) protein taxon:4896 20150502 PomBase
WARNING - Invalid identifier:GORULE:0000027: 1919005 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC513.06c dhd1 enables GO:0047837 GO_REF:0000024 ISS MGI:MGI:1919005 F D-xylose 1-dehydrogenase (NADP+) protein taxon:4896 20150502 PomBase
but its a bit weird because the display and the URL are MGI:1298204 but on the pop-up it says MGI:1298204 could you have a dig and see if the syntax has been resolved to remove the first MGI: or something?