monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
18 stars 5 forks source link

Obtaining predicates from data model of discontinued Monarch API version #718

Closed rosazwart closed 4 months ago

rosazwart commented 5 months ago

Dear Monarch team,

For a project I finished last year I had to access your API for fetching associations given a set of biological entities. This project relied on the use of predicates that cannot be found in the data model of the newest version of your API that complies with the Biolink Model. For example, entries from the Relation Ontology (RO) and the GENO ontology such as "in 1 to 1 orthology relationship with" (RO:HOM0000020) and "is allele of" (GENO:0000408).

Now I want to rerun the fetching process of my project (I have not been following all the drastic changes while I was away after finishing the project so I was not able to rerun the fetcher before the previous API versions were discontinued). My question is whether I can still acquire the predicates that were used in the discontinued API version in some way. I did find a method to get the semantically corresponding element from the Biolink Model using the Biolink Model Toolkit provided here of a given IRI (from RO and GENO in my case). However, as it is of course not a 1-to-1 mapping between these sets of predicates, I cannot use this functionality the other way around (getting the RO or GENO predicate given an element from the Biolink Model).

Thank you in advance!

kevinschaper commented 5 months ago

Oh no! I'm sorry we left you hanging here.

I don't think we have 1 to 1 orthology differentiated yet in our new graph (though we should! I'll make a separate issue for that), and in the rebuild we just focused on gene to phenotype / gene to disease and are only now getting started on re-introducing variants, alleles, genotypes and their associations.

The final graph build from the old model was in September of 2021, so it's possible that your project was already using the latest version of that data that was available. If not, or if you just need to get it again, everything is still available here:

https://data.monarchinitiative.org/202109/index.html

In particular, the tsv exports will be the mosts straightforward to work with, check out:

https://data.monarchinitiative.org/202109/tsv/all_associations/gene_homology.all.tsv.gz for the orhotology associations and maybe https://data.monarchinitiative.org/202109/tsv/all_associations/variant_gene.all.tsv.gz for allele to gene?

rosazwart commented 5 months ago

Thank you for the response!

I will definitely consider using the archived data files. I remember using the response data field called "object_category" and "subject_category" when fetching a list of associations. I looked at the content of some of the tsv files and I don't see this field present anymore. Is that because the associations are already divided into different tsv files based on the categories of the subjects and objects? I need some specific categories such as biological process, cellular component, etc. but I assume that these categories are merged with similar classes of entities in one of the files.

hbcesar commented 5 months ago

Hi @rosazwart, @kevinschaper, all,

Any update on this regard? I am also facing the very same issue.

kevinschaper commented 5 months ago

Oh, we can absolutely include that field again. I have a PR open to speed up and simplify our export process right now, so I'll bring this issue into our next release.

kevinschaper commented 4 months ago

subject_category & object_category are back in the tsv files at https://data.monarchinitiative.org/monarch-kg/latest/tsv/index.html

kevinschaper commented 4 months ago

Feel free to re-open or make a new issue if you run into problems @rosazwart & @hbcesar, and thank you so much for submitting the issue!