monarch-initiative / helpdesk

The Monarch Initiative Helpdesk
BSD 3-Clause "New" or "Revised" License
7 stars 0 forks source link

Legacy server down #70

Open nursatranscriptomine opened 1 year ago

nursatranscriptomine commented 1 year ago

Hi

We use the legacy Monarch server for data download features that are not available (that we know of) in the new version.

We get the error below when attempting to access legacy.monarchinitiative.org.

Is the legacy server still available or has it been retired? If the latter can you point us to any download functionality in the new version?

Thanks

Neil McKenna

Error: Server Error The server encountered a temporary error and could not complete your request. Please try again in 30 seconds.

kevinschaper commented 1 year ago

Hi Neil,

I restarted the service and it appears to have come back.

We don't have a shutdown date yet, but we do plan on shutting down the legacy service. We're currently developing what is essentially v3 of Monarch, where v1 is the legacy service and v2 is what's running now. Can you let me know your use case, so that we can look at supporting it in the new api?

nursatranscriptomine commented 1 year ago

Kevin

Appreciate it. Use case is, for a particular ontology identifier, downloading a TSV file containing approved symbols that map to that identifier.

For example, for the Mammalian Phenotype identifier MP:0008782, the legacy Monarch page

https://legacy.monarchinitiative.org/phenotype/MP:0008782#genes

contains the option to download as TSV

@.***

Again – if this feature is available in the new version it’s not readily apparent.

Thanks

Neil

From: Kevin Schaper @.**@.> Sent: Tuesday, January 10, 2023 5:25 PM To: monarch-initiative/helpdesk @.> Cc: McKenna, Neil J. @.>; Author @.***> Subject: Re: [monarch-initiative/helpdesk] Legacy server down (Issue #70)

CAUTION: This email is not from a BCM Source. Only click links or open attachments you know are safe.


Hi Neil,

I restarted the service and it appears to have come back.

We don't have a shutdown date yet, but we do plan on shutting down the legacy service. We're currently developing what is essentially v3 of Monarch, where v1 is the legacy service and v2 is what's running now. Can you let me know your use case, so that we can look at supporting it in the new api?

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_monarch-2Dinitiative_helpdesk_issues_70-23issuecomment-2D1378027441&d=DwMCaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GDFvtBxjFPX5072KCa2-Rcwf1zlouzkEtjeopRrWkUw&m=gj237lY9wRm51YCy7-tHR5oZoP_dwRqH387eH32SQ3MV3FfJJN4KSfXe2MiGgbi2&s=n5iO4pN5nWLTZKb4LaXh9Cbv5FsUPzLcfvXWdomuMtg&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADQ7WF5ZEYU2NVNDGVUH2CDWRXVVVANCNFSM6AAAAAATXLWCNM&d=DwMCaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GDFvtBxjFPX5072KCa2-Rcwf1zlouzkEtjeopRrWkUw&m=gj237lY9wRm51YCy7-tHR5oZoP_dwRqH387eH32SQ3MV3FfJJN4KSfXe2MiGgbi2&s=uYSFEDIbVeBQnU_mhqGl1a5C0d4pEi_R2Mcz2LegkAc&e=. You are receiving this because you authored the thread.Message ID: @.**@.>>

kevinschaper commented 1 year ago

Interestingly, following that tsv link essentially brings you from the legacy v1 architecture to the public Solr server that's part of the v2 architecture:

https://solr.monarchinitiative.org/solr/golr/select?defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=subject,subject_label,subject_taxon,subject_taxon_label,object,object_label,relation,relation_label,evidence,evidence_label,source,is_defined_by,qualifier&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&facet.method=enum&csv.encapsulator=%22&csv.separator=%09&csv.header=true&csv.mv.separator=%7C&fq=subject_category:%22gene%22&fq=object_closure:%22MP:0008782%22&facet.field=subject_taxon_label&q=*:*

It's probably a terrible long term idea to replace what you have with this url (substituting in different values for MP:0008782), but in a pinch, for the time being, it should work even if the legacy server freezes again.

From our existing production API, you can get JSON returned which contains the gene labels

If you happen to have curl and jq installed, this will extract gene symbols as essentially a single column quoted tsv. The url will work otherwise on its own of course, but obviously it's JSON format. Is tsv output a critical need for you?

curl -sX GET "https://api.monarchinitiative.org/api/bioentity/phenotype/MP:0008782/genes?rows=10000" -H "accept: application/json" | jq '.associations[].object.label'

We still have a little bit of work to before a beta of our v3 api is up and running, but the data artifacts are available (though still subject to change!) and might come in handy.

http://data.monarchinitiative.org/monarch-kg-dev/latest/monarch-kg-denormalized-edges.tsv.gz is the file that we'll be using to populate our new Solr instance. If I download it, unzip and query with q like so:

q "select subject_label from monarch-kg-denormalized-edges.tsv where object ='MP:0008782'"
myeloid cell leukemia sequence 1
nuclear factor of activated T cells, cytoplasmic, calcineurin dependent 1
signal transducer and activator of transcription 6
POU domain, class 2, associating factor 1
B cell CLL/lymphoma 11A (zinc finger protein)
telomerase RNA component
phosphatase and tensin homolog 
...

It looks like it's returning gene names rather than symbols, so I think it's not ready for prime time just yet (I'd like that field to be populated with gene symbols). An alternative (using just file artifacts), would be our sqlite database artifact which wraps node and edge tsv tables up in a database file: http://data.monarchinitiative.org/monarch-kg-dev/latest/monarch-kg.db.gz

sqlite3 monarch-kg.db "select distinct nodes.symbol from nodes, edges where nodes.id = edges.subject and edges.object = 'MP:0008782' and edges.predicate = 'biolink:has_phenotype'"
Bcl2
Cd22
Ebf1
Ets1
Grb2
Blnk
Myc
Prkcd
Plcg2
Mcl1
Nfatc1
Stat6
Pou2af1
Bcl11a
Pten
Terc
Spib
Smad7
Smarcc1
Ep300
Ikbkb
Faim
Tlr2
Sh3bp2
Adgrg3
Sfn
Huwe1
Peli1
Parp14
Fam72a
Ube2n
Micu1
Fnip1
Pdap1
Mir150
Atmin
Gm614
Nfkbid

The monarch-kg-dev artifacts aren't my suggested solution right now, but they're available for a look.

Is your ideal to continue having an API endpoint that takes the phenotypic feature ID and returns in tsv format?

Also, we're working on a Solr docker container with pre-populated data available that anyone can run in their own stack, is that appealing to you at all?

nursatranscriptomine commented 1 year ago

Kevin

Thanks for the detailed reply. I’ll discuss options with my programmer. In the meantime the v2 architecture link is helpful.

Sincerely

Neil

From: Kevin Schaper @.> Sent: Wednesday, January 11, 2023 8:12 PM To: monarch-initiative/helpdesk @.> Cc: McKenna, Neil J. @.>; Author @.> Subject: Re: [monarch-initiative/helpdesk] Legacy server down (Issue #70)

CAUTION: This email is not from a BCM Source. Only click links or open attachments you know are safe.


Interestingly, following that tsv link essentially brings you from the legacy v1 architecture to the public Solr server that's part of the v2 architecture:

https://solr.monarchinitiative.org/solr/golr/select?defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=subject,subject_label,subject_taxon,subject_taxon_label,object,object_label,relation,relation_label,evidence,evidence_label,source,is_defined_by,qualifier&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&facet.method=enum&csv.encapsulator=%22&csv.separator=%09&csv.header=true&csv.mv.separator=%7C&fq=subject_category:%22gene%22&fq=object_closure:%22MP:0008782%22&facet.field=subject_taxon_label&q=:

It's probably a terrible long term idea to replace what you have with this url (substituting in different values for MP:0008782), but in a pinch, for the time being, it should work even if the legacy server freezes again.

From our existing production API, you can get JSON returned which contains the gene labels

If you happen to have curl and jq installed, this will extract gene symbols as essentially a single column quoted tsv. The url will work otherwise on its own of course, but obviously it's JSON format. Is tsv output a critical need for you?

curl -sX GET "https://api.monarchinitiative.org/api/bioentity/phenotype/MP:0008782/genes?rows=10000" -H "accept: application/json" | jq '.associations[].object.label'

We still have a little bit of work to before a beta of our v3 api is up and running, but the data artifacts are available (though still subject to change!) and might come in handy.

http://data.monarchinitiative.org/monarch-kg-dev/latest/monarch-kg-denormalized-edges.tsv.gzhttps://urldefense.proofpoint.com/v2/url?u=http-3A__data.monarchinitiative.org_monarch-2Dkg-2Ddev_latest_monarch-2Dkg-2Ddenormalized-2Dedges.tsv.gz&d=DwMCaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GDFvtBxjFPX5072KCa2-Rcwf1zlouzkEtjeopRrWkUw&m=7AGogsLI0rz7igiDNb7NxEThkoS53ATAJvlhS6ziwv8wEAgZBoinNrI2eN0Qe_Ot&s=PvKj3sO3u1Ycip25J0_XFeH1gscze6V9DcTfl6zvcYo&e= is the file that we'll be using to populate our new Solr instance. If I download it, unzip and query with qhttps://urldefense.proofpoint.com/v2/url?u=http-3A__harelba.github.io_q_&d=DwMCaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GDFvtBxjFPX5072KCa2-Rcwf1zlouzkEtjeopRrWkUw&m=7AGogsLI0rz7igiDNb7NxEThkoS53ATAJvlhS6ziwv8wEAgZBoinNrI2eN0Qe_Ot&s=sL28HEJuTXyeW15lbJY8ldyEJHLJPZM0TDmHf7-Hpn4&e= like so:

q "select subject_label from monarch-kg-denormalized-edges.tsv where object ='MP:0008782'"

myeloid cell leukemia sequence 1

nuclear factor of activated T cells, cytoplasmic, calcineurin dependent 1

signal transducer and activator of transcription 6

POU domain, class 2, associating factor 1

B cell CLL/lymphoma 11A (zinc finger protein)

telomerase RNA component

phosphatase and tensin homolog

...

It looks like it's returning gene names rather than symbols, so I think it's not ready for prime time just yet (I'd like that field to be populated with gene symbols). An alternative (using just file artifacts), would be our sqlite database artifact which wraps node and edge tsv tables up in a database file: http://data.monarchinitiative.org/monarch-kg-dev/latest/monarch-kg.db.gzhttps://urldefense.proofpoint.com/v2/url?u=http-3A__data.monarchinitiative.org_monarch-2Dkg-2Ddev_latest_monarch-2Dkg.db.gz&d=DwMCaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GDFvtBxjFPX5072KCa2-Rcwf1zlouzkEtjeopRrWkUw&m=7AGogsLI0rz7igiDNb7NxEThkoS53ATAJvlhS6ziwv8wEAgZBoinNrI2eN0Qe_Ot&s=II1WKh4A_Z5DMtW_ThHsbIkq3lHHIdHzPReFMktqd1M&e=

sqlite3 monarch-kg.db "select distinct nodes.symbol from nodes, edges where nodes.id = edges.subject and edges.object = 'MP:0008782' and edges.predicate = 'biolink:has_phenotype'"

Bcl2

Cd22

Ebf1

Ets1

Grb2

Blnk

Myc

Prkcd

Plcg2

Mcl1

Nfatc1

Stat6

Pou2af1

Bcl11a

Pten

Terc

Spib

Smad7

Smarcc1

Ep300

Ikbkb

Faim

Tlr2

Sh3bp2

Adgrg3

Sfn

Huwe1

Peli1

Parp14

Fam72a

Ube2n

Micu1

Fnip1

Pdap1

Mir150

Atmin

Gm614

Nfkbid

I should clarify, the monarch-kg-dev aren't my suggested solution right now, but they're available for a look.

Is your ideal to continue having an API endpoint that takes the phenotypic feature ID and returns in tsv format?

Also, we're working on a Solr docker container with pre-populated data available that anyone can run in their own stack, is that appealing to you at all?

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_monarch-2Dinitiative_helpdesk_issues_70-23issuecomment-2D1379723234&d=DwMCaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GDFvtBxjFPX5072KCa2-Rcwf1zlouzkEtjeopRrWkUw&m=7AGogsLI0rz7igiDNb7NxEThkoS53ATAJvlhS6ziwv8wEAgZBoinNrI2eN0Qe_Ot&s=wQDEO1rzBt81Zx9yBSpuHV0UAssaRht3eyrnrgT2ETs&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADQ7WFZJXXVYM5AUXU7LUOLWR5R6JANCNFSM6AAAAAATXLWCNM&d=DwMCaQ&c=ZQs-KZ8oxEw0p81sqgiaRA&r=GDFvtBxjFPX5072KCa2-Rcwf1zlouzkEtjeopRrWkUw&m=7AGogsLI0rz7igiDNb7NxEThkoS53ATAJvlhS6ziwv8wEAgZBoinNrI2eN0Qe_Ot&s=jBxu1nqxigV09pWWYxGI3LY17Qm19A6cdvhSloe-aGw&e=. You are receiving this because you authored the thread.Message ID: @.**@.>>