oceanproteinportal / api

A Python API module for the Ocean Protein Portal
1 stars 0 forks source link

SPARQL query - models.py:187 and models:py:655 #4

Closed sanjaygovindarajan closed 2 years ago

sanjaygovindarajan commented 3 years ago

models.py:187

get_peptide_profile_data params: peptideSequence, dataset_id returns: Array of Stations with array of Depths and spectralCountSums

models.py:655

search_peptides.json params: peptideSequence, dataset_id (optional) returns: Array of Stations with array of Depths and spectralCountSums

SELECT DISTINCT STR(?depth) AS ?depth ?station STR(?spectralCount) AS ?spectralCount
WHERE 
  { VALUES ?graph { <urn:opp:graph:metzyme-0.2> }
    GRAPH ?graph { <urn:opp:dataset:metzyme-0.2> opp:storesResultsForSample ?sample .
    ?sample sosa:isSampleOf ?feature .
    ?feature opp:depth ?depth .
    ?feature opp:inVicinityOfStation ?stationInfo .
    ?stationInfo opp:stationName ?station .      
    ?observation opp:observationInSample ?sample .
    ?observation schema:value ?spectralCount . 
    ?observation rdf:type opp:PeptideSpectralCountSum .
    ?peptide opp:samplePeptideProperty ?observation .
    ?peptide opp:describesPeptide ?identifier .
    ?identifier opp:peptideSequence "AAAAELAAFK"^^xsd:token .
    }}
ORDER BY ?station

Snorql OPP query

Note: This query includes the protein id AAAAELAAFK as an example value for the protein and metzyme-0.2 as an example for the database parameter.

kaimikacolin commented 2 years ago

Hi @sanjaygovindarajan , I noticed you have and , and I think I only have "urn:opp:dataset:metzyme-0.2" in my system - is there a way you can change this to not use urn:opp:graph:metzyme-0.2?

jaclynsaunders commented 2 years ago

Hi @kaimikacolin, Here, @sanjaygovindarajan is using a named graph to restrict the search space to make the query faster. Here's the query in snorql without restricting to the named graph.

SELECT DISTINCT STR(?depth) AS ?depth ?station STR(?spectralCount) AS ?spectralCount
WHERE 
  { <urn:opp:dataset:metzyme-0.2> opp:storesResultsForSample ?sample .
    ?sample sosa:isSampleOf ?feature .
    ?feature opp:depth ?depth .
    ?feature opp:inVicinityOfStation ?stationInfo .
    ?stationInfo opp:stationName ?station .      
    ?observation opp:observationInSample ?sample .
    ?observation schema:value ?spectralCount . 
    ?observation rdf:type opp:PeptideSpectralCountSum .
    ?peptide opp:samplePeptideProperty ?observation .
    ?peptide opp:describesPeptide ?identifier .
    ?identifier opp:peptideSequence "AAAAELAAFK"^^xsd:token .
    }
ORDER BY ?station

Let me know if that satisfies the query. If performance suffers greatly, we can add an additional workaround. Also - feel free to tag me for any questions as well. Sorry - just got a notification for this the other day. Thanks!

kaimikacolin commented 2 years ago

@jaclynsaunders This looks like it will work, is there a specific peptide sequence that I can use that would return results?

This is an empty set, I also can't seem to get any strings to return results on the live site:

https://kg.oceanproteinportal.org/?query=SELECT+DISTINCT+STR%28%3Fdepth%29+AS+%3Fdepth+%3Fstation+STR%28%3FspectralCount%29+AS+%3FspectralCount%0D%0AWHERE+%0D%0A++%7B+%3Curn%3Aopp%3Adataset%3Ametzyme-0.2%3E+opp%3AstoresResultsForSample+%3Fsample+.%0D%0A++++%3Fsample+sosa%3AisSampleOf+%3Ffeature+.%0D%0A++++%3Ffeature+opp%3Adepth+%3Fdepth+.%0D%0A++++%3Ffeature+opp%3AinVicinityOfStation+%3FstationInfo+.%0D%0A++++%3FstationInfo+opp%3AstationName+%3Fstation+.+%09+%0D%0A++++%3Fobservation+opp%3AobservationInSample+%3Fsample+.%0D%0A++++%3Fobservation+schema%3Avalue+%3FspectralCount+.+%0D%0A++++%3Fobservation+rdf%3Atype+opp%3APeptideSpectralCountSum+.%0D%0A++++%3Fpeptide+opp%3AsamplePeptideProperty+%3Fobservation+.%0D%0A++++%3Fpeptide+opp%3AdescribesPeptide+%3Fidentifier+.%0D%0A++++%3Fidentifier+opp%3ApeptideSequence+%22AAAAELAAFK%22%5E%5Exsd%3Atoken+.%0D%0A++++%7D%0D%0AORDER+BY+%3Fstation&graph=urn%3Aopp%3Aviews%3Anunn-bering-sea

jaclynsaunders commented 2 years ago

So peptide sequence 'AAAAELAAFK' should work. Let me know if it's still not working for you

https://kg.oceanproteinportal.org/?query=SELECT+DISTINCT+STR%28%3Fdepth%29+AS+%3Fdepth+%3Fstation+STR%28%3FspectralCount%29+AS+%3FspectralCount%0D%0AWHERE+%0D%0A++%7B+%3Curn%3Aopp%3Adataset%3Ametzyme-0.2%3E+opp%3AstoresResultsForSample+%3Fsample+.%0D%0A++++%3Fsample+sosa%3AisSampleOf+%3Ffeature+.%0D%0A++++%3Ffeature+opp%3Adepth+%3Fdepth+.%0D%0A++++%3Ffeature+opp%3AinVicinityOfStation+%3FstationInfo+.%0D%0A++++%3FstationInfo+opp%3AstationName+%3Fstation+.+%09+%0D%0A++++%3Fobservation+opp%3AobservationInSample+%3Fsample+.%0D%0A++++%3Fobservation+schema%3Avalue+%3FspectralCount+.+%0D%0A++++%3Fobservation+rdf%3Atype+opp%3APeptideSpectralCountSum+.%0D%0A++++%3Fpeptide+opp%3AsamplePeptideProperty+%3Fobservation+.%0D%0A++++%3Fpeptide+opp%3AdescribesPeptide+%3Fidentifier+.%0D%0A++++%3Fidentifier+opp%3ApeptideSequence+%22AAAAELAAFK%22%5E%5Exsd%3Atoken+.%0D%0A++++%7D%0D%0AORDER+BY+%3Fstation

kaimikacolin commented 2 years ago

Yes, I see my query param was setting "graph" to bering sea and causuing the null result. This is ready to implement, thank you!!

jaclynsaunders commented 2 years ago

Awesome. Thanks Colin!

jaclynsaunders commented 2 years ago

Hi @sanjaygovindarajan , I noticed you have urn:opp:graph:metzyme-0.2 and urn:opp:dataset:metzyme-0.2, and I think I only have "urn:opp:dataset:metzyme-0.2" in my system - is there a way you can change this to not use urn:opp:graph:metzyme-0.2?

@ashepherd - is there a direct way to go from urn:opp:dataset:metzyme-0.2 to urn:opp:graph:metzyme-0.2 within the KG or is just changing the string "dataset" to "graph" the proper solution here?

(For the working solution currently, we dropped the named graph but if performance suffers then good to know which way to proceed with the named graphs.)

Thanks!

ashepherd commented 2 years ago

Ok, just fixed this. Sorry about that! It's important to keep the distinction between what is a graph and what is the dataset inside a graph. They aren't really the same thing, so I created distinct link between them to make querying easier ?dataset a opp:OPPDataset . ?graph schema:dataset ?dataset

https://kg.oceanproteinportal.org/?query=SELECT+DISTINCT+%3Fg+%3Fd%0D%0AWHERE+%7B%0D%0A++%3Fd+a+opp%3AOPPDataset+.%0D%0A++%3Fg+schema%3Adataset+%3Fd+.%0D%0A%7D%0D%0A

ashepherd commented 2 years ago

Been thinking about the graph label and dataset label. I think it's possible we could use the same value there. We could multi-type the dataset as both a Dataset and a NamedGraph. Probably won't matter. What do others think? Any concerns?

kaimikacolin commented 2 years ago

Okay cool - Let me know if the original query needs to change!