monarch-initiative / monarch-legacy

Monarch web application and API
BSD 3-Clause "New" or "Revised" License
42 stars 37 forks source link

Species not specified in resulting data of compare api #1024

Open yuanzhou opened 8 years ago

yuanzhou commented 8 years ago

I'm testing the phenogrid genotype expansion feature, the following URL will use the compare api to get matches of those fish genotypes and provided phenotype list.

http://beta.monarchinitiative.org/compare/HP:0000726+HP:0000746+HP:0001300+HP:0002367+HP:0000012+HP:0000716+HP:0000739+HP:0001332+HP:0001347+HP:0002063+HP:0002067+HP:0002172+HP:0002322+HP:0007159+HP:0009466+HP:0001972+HP:0003502+HP:0005190+HP:0001763+HP:0003318+HP:0001371+HP:0001380+HP:0001394+HP:0001156+HP:0001159+HP:0001004+HP:0010886+HP:0006530+HP:0004792+HP:0002816+HP:0000954+HP:0000974+HP:0000765+HP:0000602+HP:0002240+HP:0030084+HP:0000049+HP:0000028+HP:0000175+HP:0000204+HP:0000202+HP:0000316+HP:0000343+HP:0000349+HP:0002023+HP:0000327+HP:0002055+HP:0000394+HP:0000463+HP:0000431+HP:0000494+HP:0000484+HP:0000486+HP:0000508+HP:0009748+HP:0007018+HP:0001773+HP:0001769+HP:0003311+HP:0001544+HP:0001508+HP:0003196+HP:0001249+HP:0001187+HP:0001169+HP:0012774+HP:0002650/_:genid1122228,MONARCH:63a5331abafd1d19ad8fe45807187a83,MONARCH:e4d9715f30d99dec3012082942497396

In the returned JSON, taxon.label should say 'Danio rerio' instead of 'Not Specified'.

@harryhoch since we rely on the species name to create the target list and grid cell data. This 'Not Specified' species name will generate no matches in the grid.

yuanzhou commented 8 years ago

Here is a screenshot:

capture

harryhoch commented 8 years ago

yes. @kshefchek, @nlwashington, this looks like a data issue. Do you have any idea what's up?

kshefchek commented 8 years ago

There are some strange syncing issues here with the monarch prefixed genotype ids. Right now the ids that appear in our production solr index are not in our scigraph db, which is where we pull taxon information. This should be resolved in the next data release. Also see https://github.com/monarch-initiative/monarch-app/issues/1004#issuecomment-148096917

@yuanzhou if you see these same issues with any other ID prefix please let me know.

yuanzhou commented 8 years ago

Actually the taxon.id is also missing. I can workaround this since we've already known the id and species name before sending out the compar URL. This still needs to be addressed in the API level.

kshefchek commented 8 years ago

@yuanzhou is it possible to filter out monarch prefixed ids until the ids are consistent across our production solr index and scigraph db?

yuanzhou commented 8 years ago

@kshefchek I can do that. Basically we only get the first 5 genotypes of any expanded gene, if there's any MONARCH prefixed genotypes, I can just ignore them.

nlwashington commented 8 years ago

presumably this data is coming from a golr call for gene-genotype? this ought to have subject_taxon and object_taxon in the golr, and then you could get the tax_id and tax_label from that? or is that what is missing?

nlwashington commented 8 years ago

we should also think ahead to the case where we have genotypes in one species that include genes from other species. for example, if a fish gene is inserted into a fly, then presumably if you query the golr with the fish gene id, you will get back a fly genotype. my feeling is that you still want to show all of the genotypes, regardless of their "host" species. therefore the genotype should just be bound in some way to it's parent element, no?

yuanzhou commented 8 years ago

@nlwashington I got all the associated genotypes of a provided gene using this API that Kent created for this purpose. And I think it's from Golr index. For example:

http://beta.monarchinitiative.org/gene/MGI:96173/genotype_list.json

@harryhoch any thoughts on the second comment that Nicole added?

kshefchek commented 8 years ago

I can add taxon information to that genotype_list output if that would be useful.

yuanzhou commented 8 years ago

@kshefchek don't worry about the taxon for now, I've already had a simple workaround. I simply reused the species name associated to the parent gene's, since the expanded genotypes should belong to the same species. Unless there's some exceptions as described by Nicole.

yuanzhou commented 8 years ago

Not sure what's going on with theis API, the same URL

http://beta.monarchinitiative.org/compare/HP:0000726+HP:0000746+HP:0001300+HP:0002367+HP:0000012+HP:0000716+HP:0000739+HP:0001332+HP:0001347+HP:0002063+HP:0002067+HP:0002172+HP:0002322+HP:0007159+HP:0009466+HP:0001972+HP:0003502+HP:0005190+HP:0001763+HP:0003318+HP:0001371+HP:0001380+HP:0001394+HP:0001156+HP:0001159+HP:0001004+HP:0010886+HP:0006530+HP:0004792+HP:0002816+HP:0000954+HP:0000974+HP:0000765+HP:0000602+HP:0002240+HP:0030084+HP:0000049+HP:0000028+HP:0000175+HP:0000204+HP:0000202+HP:0000316+HP:0000343+HP:0000349+HP:0002023+HP:0000327+HP:0002055+HP:0000394+HP:0000463+HP:0000431+HP:0000494+HP:0000484+HP:0000486+HP:0000508+HP:0009748+HP:0007018+HP:0001773+HP:0001769+HP:0003311+HP:0001544+HP:0001508+HP:0003196+HP:0001249+HP:0001187+HP:0001169+HP:0012774+HP:0002650/_:genid1122228,MONARCH:63a5331abafd1d19ad8fe45807187a83,MONARCH:e4d9715f30d99dec3012082942497396

now returns a different JSON.

kshefchek commented 8 years ago

We're testing out a new GOlr load. IDs starting with an underscore or prefixed with MONARCH: do not persist across different scigraph loads - I think, @nlwashington could you confirm?

yuanzhou commented 8 years ago

I also tried this one:

http://beta.monarchinitiative.org/compare/HP:0000726+HP:0000746/NCBIGene:388552,NCBIGene:12166

and it only returns

{"a":{"label":"HP:0000726+HP:0000746+HP:0001300","id_list":["HP:0000726","HP:0000746","HP:0001300"]}}
kshefchek commented 8 years ago

It looks we get the same output on production: http://monarchinitiative.org/compare/HP:0000726+HP:0000746/NCBIGene:388552,NCBIGene:12166

The compare example on our docs page is working: http://beta.monarchinitiative.org/compare/OMIM:270400/NCBIGene:5156,OMIM:249000,OMIM:194050.json

yuanzhou commented 8 years ago

Also noticed that it works without that .json extension.

http://beta.monarchinitiative.org/compare/OMIM:270400/NCBIGene:5156,OMIM:249000,OMIM:194050

yuanzhou commented 8 years ago

But I'll still need to query with a list of phenotypes and a list of genes. The working example you pasted here is a different format.

yuanzhou commented 8 years ago

I tried this one and it returned the desired simsearch JSON

http://beta.monarchinitiative.org/compare/HP:0000726+HP:0000746+HP:0001300+HP:0002367+HP:0000012+HP:0000716+HP:0000739+HP:0001332+HP:0001347+HP:0002063+HP:0002067+HP:0002172+HP:0002322+HP:0007159+HP:0009466+HP:0001972+HP:0003502+HP:0005190+HP:0001763+HP:0003318+HP:0001371+HP:0001380+HP:0001394+HP:0001156+HP:0001159+HP:0001004+HP:0010886+HP:0006530+HP:0004792+HP:0002816+HP:0000954+HP:0000974+HP:0000765+HP:0000602+HP:0002240+HP:0030084+HP:0000049+HP:0000028+HP:0000175+HP:0000204+HP:0000202+HP:0000316+HP:0000343+HP:0000349+HP:0002023+HP:0000327+HP:0002055+HP:0000394+HP:0000463+HP:0000431+HP:0000494+HP:0000484+HP:0000486+HP:0000508+HP:0009748+HP:0007018+HP:0001773+HP:0001769+HP:0003311+HP:0001544+HP:0001508+HP:0003196+HP:0001249+HP:0001187+HP:0001169+HP:0012774+HP:0002650/NCBIGene:388552,NCBIGene:12166

yuanzhou commented 8 years ago

@kshefchek just to check that if this "Species not specified" has been addressed?

Do I still need to filter out MONARCH prefixed genotypes from the resulting genotype_list.json? Right now the genotype expansion feature has been disabled in Phenogrid due to these api related issue. I'll check with Harry to see if this feature should be enabled for the Thanksgiving release.

yuanzhou commented 8 years ago

@kshefchek tested the URL again

http://beta.monarchinitiative.org/compare/HP:0000726+HP:0000746+HP:0001300+HP:0002367+HP:0000012+HP:0000716+HP:0000739+HP:0001332+HP:0001347+HP:0002063+HP:0002067+HP:0002172+HP:0002322+HP:0007159+HP:0009466+HP:0001972+HP:0003502+HP:0005190+HP:0001763+HP:0003318+HP:0001371+HP:0001380+HP:0001394+HP:0001156+HP:0001159+HP:0001004+HP:0010886+HP:0006530+HP:0004792+HP:0002816+HP:0000954+HP:0000974+HP:0000765+HP:0000602+HP:0002240+HP:0030084+HP:0000049+HP:0000028+HP:0000175+HP:0000204+HP:0000202+HP:0000316+HP:0000343+HP:0000349+HP:0002023+HP:0000327+HP:0002055+HP:0000394+HP:0000463+HP:0000431+HP:0000494+HP:0000484+HP:0000486+HP:0000508+HP:0009748+HP:0007018+HP:0001773+HP:0001769+HP:0003311+HP:0001544+HP:0001508+HP:0003196+HP:0001249+HP:0001187+HP:0001169+HP:0012774+HP:0002650/_:genid1122228,MONARCH:63a5331abafd1d19ad8fe45807187a83,MONARCH:e4d9715f30d99dec3012082942497396

The resulting JSON is very different from the original output that I posted in the beginning of this issue. There's no b

kshefchek commented 8 years ago

The IDs you are comparing against (_:genid, MONARCH:1234) are unstable IDs and may change across scigraph versions, I suspect that is the issue.

yuanzhou commented 8 years ago

I filtered the MONARCH: prefixed IDs, I'll go ahead and also filter out the _:genid

frdougal commented 8 years ago

@harryhoch @kshefchek, can you take a look at this again? It is related to another bug in phenogrid: https://github.com/monarch-initiative/phenogrid/issues/253. If I run /compare on genes from multiple species, I can see the taxon name for the Homo sapiens genes, but none of the other species has their taxon data associated (see below). Gene 19090 is a mouse gene and gene 1804 is a human gene.

screen shot 2016-07-25 at 10 20 42 am