Closed kevinschaper closed 10 months ago
This is strange.
In alliance_gene_nodes.tsv and hgnc_gene_nodes.tsv I don't see anything with a missing taxon.
After the merge, I see just 3 genes with no taxon, which appear to have come from the same files:
"select distinct category, in_taxon, id, provided_by from output/monarch-kg_nodes.tsv where category = 'biolink:Gene' and in_taxon not like 'NCBI%' order by 1, 2"
category in_taxon id provided_by
biolink:Gene None SGD:S000004416 output/transform_output/alliance_gene_nodes.tsv
biolink:Gene None HGNC:21060 output/transform_output/hgnc_gene_nodes.tsv
biolink:Gene None HGNC:40992 output/transform_output/hgnc_gene_nodes.tsv
Removing my assignment, because this doesn't feel urgent...even though it's very spooky.
Update: here is the current list, still spooky
sqlite3 -markdown monarch-kg.db "select distinct category, in_taxon, id, provided_by from nodes where category = 'biolink:Gene' and in_taxon not like 'NCBI%' order by 1, 2" | category | in_taxon | id | provided_by |
---|---|---|---|---|
biolink:Gene | HGNC:21060 | hgnc_gene_nodes | ||
biolink:Gene | HGNC:40992 | hgnc_gene_nodes | ||
biolink:Gene | WB:WBGene00006857 | alliance_gene_nodes | ||
biolink:Gene | WB:WBGene00006858 | alliance_gene_nodes | ||
biolink:Gene | WB:WBGene00006859 | alliance_gene_nodes | ||
biolink:Gene | WB:WBGene00006860 | alliance_gene_nodes | ||
biolink:Gene | SGD:S000004416 | alliance_gene_nodes | ||
biolink:Gene | RGD:1306669 | alliance_gene_nodes | ||
biolink:Gene | RGD:1308036 | alliance_gene_nodes | ||
biolink:Gene | RGD:70947 | alliance_gene_nodes | ||
biolink:Gene | RGD:1585231 | alliance_gene_nodes | ||
biolink:Gene | RGD:1307632 | alliance_gene_nodes | ||
biolink:Gene | RGD:1309993 | alliance_gene_nodes | ||
biolink:Gene | RGD:1359631 | alliance_gene_nodes | ||
biolink:Gene | MGI:1334416 | alliance_gene_nodes | ||
biolink:Gene | MGI:1334417 | alliance_gene_nodes | ||
biolink:Gene | MGI:1342270 | alliance_gene_nodes | ||
biolink:Gene | MGI:1860417 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0004598 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0015371 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0015931 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0015932 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0019928 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0019929 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0020828 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0020831 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0020850 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0023179 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0025343 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0025608 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0026616 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0027588 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0027661 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0029688 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0044423 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0044424 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0044425 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0044426 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0062518 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0070051 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0070056 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0070057 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0283652 | alliance_gene_nodes | ||
biolink:Gene | FB:FBgn0285970 | alliance_gene_nodes |
@kevinschaper 👻 .... spooky, though fixed? or spooky and still unsolved?
It looks like it's solved!
the same query now returns phenio cruft instead
category | in_taxon | id | provided_by |
---|---|---|---|
biolink:Gene | SIO:010035 | phenio_nodes | |
biolink:Gene | DATACOMMONS:Gene | phenio_nodes |
I'll close this, and open a new issue for limiting our phenio nodes by category
I was looking at the ID prefix & taxon ID for all of our genes, and realized that we're missing the
in_taxon
field for human and yeast genes