monarch-initiative / helpdesk

The Monarch Initiative Helpdesk
BSD 3-Clause "New" or "Revised" License
7 stars 0 forks source link

Pombe genes in Monarch #28

Closed ValWood closed 1 year ago

ValWood commented 3 years ago

Please describe your question, suggestion, or concern.

I noticed by chance the Monarch now imports pombe genes. https://monarchinitiative.org/gene/NCBIGene:2543177#phenotype

However, the phenotype data is unpopulated.

The text says "We are not aware of any authoritative sources of observed human phenotypes or diseases for this gene. We are always improving our knowledgebase; to suggest a new source, please submit a ticket." which seems odd as this isn't a human gene.

Are you interested in displaying our phenotype data here? (I think that was a plan?) If so you can download the (82,000) single gene annotations from here: https://www.pombase.org/downloads/phenotype-annotations (format is also described)

As 70% of pombe genes have human orthologs,and 38% of these are disease-associated it would be useful to link to the human orthologs too. Especially since we have lots of phenotypes associated with single amino acid variants that could be informative about disease mechanisms.

CC @cmungall CC @mah11

nlharris commented 2 years ago

Hi Val, sorry no one has responded to your question! @nicolevasilevsky do you have any pointers for Val?

nicolevasilevsky commented 2 years ago

Hi @ValWood and @nlharris. Unfortunately, I am not the best person to help with this. Perhaps @kshefchek can help or redirect this issue to the right person?

ValWood commented 2 years ago

I think @kshefchek is onto it but the conversation wasn't via this ticket. v

kshefchek commented 2 years ago

It's high priority for the ingest rewrites, see https://github.com/monarch-initiative/monarch-ingest/issues/2

nlharris commented 2 years ago

Any news on this?

putmantime commented 2 years ago

Hi @ValWood, Apologies for the delay in response to this. We are working on the next generation Monarch Graph and Pombase phenotype associations (phenotype_annotations.pombase.phaf.gz) are being ingested through our new ingest pipeline here https://github.com/monarch-initiative/monarch-ingest/tree/main/monarch_ingest/pombase.

This includes ~84000 has_phenotype relationships and ~12800 genes.

In addition to the Pombase genes and phenotype associations we have ingested orthologs of many species to pombase genes including humans.

While this pipeline and KG are under construction, it will soon be served by Neo4J, Blazegraph and eventually replace the data behind Monarch's REST API.

ValWood commented 2 years ago

Great news. I guess the number ~12800 is because you are loading all genes including non-coding RNAs? You might not want to ingest all of the non-coding RNAs at Monarch, it is likely that many are non-functional/cryptic/noise. Probably it isn't a big issue for now, but you might want to consider only importing ncRNAs that have phenotypes attached (especially as most non coding RNAs are not widely conserved). It could give you a lot of crud, and a massive inflation of entities including annotated ncRNA for all species (which will affect analyses).

I'm in this camp: https://www.annualreviews.org/doi/abs/10.1146/annurev-genom-112921-123710

nlharris commented 1 year ago

Adding @kevinschaper to watch list

putmantime commented 1 year ago

Have implemented a koza filter to only include coding genes https://github.com/monarch-initiative/monarch-ingest/pull/376