Open kaimikacolin opened 2 years ago
Marking this one "High"
I will develop a template query for each of the 5 search value queries from the UI
REGEX Searches:
SELECT DISTINCT ?protein ?product_name ?dataset ?dataset_name ?expedition ?spectral_count ?protein_id ?peptide_count ?sequence ?ids
WHERE {
?protein a <http://schema.oceanproteinportal.org/v2/views/ProteinFound> .
# if search term: protein name (ex. 'Fe')
?protein view:productName ?o .
# else if search term: Sequence (ex. 'AA')
?protein view:sequence ?o
FILTER REGEX(?o, "{search value from UI}", "i")
?protein view:productName ?product_name .
?protein view:identifiers ?ids .
?protein view:datasetName ?dataset_name .
?protein view:dataset ?dataset .
?protein view:expedition ?expedition .
?protein view:spectralCount ?spectral_count .
?protein view:proteinID ?protein_id .
?protein view:peptideCount ?peptide_count .
?protein view:sequence ?sequence .
}
ORDER BY DESC(?spectral_count)
Identifier Search:
SELECT DISTINCT ?protein ?product_name ?dataset ?dataset_name ?expedition ?spectral_count ?protein_id ?peptide_count ?sequence ?ids
WHERE {
# if search term: Kegg (ex. 'K03404')
?id a opp:KeggOrthologyIdentifier .
# else if search term: Uniprot ID (ex. 'G4E3S1')
?id a opp:UniprotProteinIdentifier .
# else if search term: PFams (ex. 'PF04117')
?id a opp:PFamsIdentifier .
?id opp:identifierValue ?o .
FILTER REGEX(?o, "{search value from UI}", "i") .
?pid opp:identifier ?id .
?protein view:viewOf ?pid .
?protein a <http://schema.oceanproteinportal.org/v2/views/ProteinFound> .
?protein view:productName ?product_name .
?protein view:identifiers ?ids .
?protein view:datasetName ?dataset_name .
?protein view:dataset ?dataset .
?protein view:expedition ?expedition .
?protein view:spectralCount ?spectral_count .
?protein view:proteinID ?protein_id .
?protein view:peptideCount ?peptide_count .
?protein view:sequence ?sequence .
}
ORDER BY DESC(?spectral_count)
To filter either of these query down to a specific dataset, add the lines:
ex: Metzyme 3.0
?protein view:dataset <urn:opp:dataset:metzyme-3.0>
@ashepherd I just added the queries for UniprotID, Pfams, and Kegg, and they are looking great. I've combined these with the VALUES directive that sped up the query mentioned in another ticket:
` SELECT DISTINCT ?protein ?product_name ?dataset ?dataset_name ?expedition ?spectral_count ?protein_id ?peptide_count ?sequence ?ids WHERE { VALUES ?p { view:productName view:identifiers view:expedition view:proteinID view:sequence } ?protein a http://schema.oceanproteinportal.org/v2/views/ProteinFound .
FILTER REGEX(?o, "K03404", "i")
?id a opp:KeggOrthologyIdentifier .
?id opp:identifierValue ?o .
?pid opp:identifier ?id .
?protein view:viewOf ?pid .
?protein view:productName ?product_name .
?protein view:identifiers ?ids .
?protein view:datasetName ?dataset_name .
?protein view:dataset ?dataset .
?protein view:expedition ?expedition .
?protein view:spectralCount ?spectral_count .
?protein view:proteinID ?protein_id .
?protein view:peptideCount ?peptide_count .
?protein view:sequence ?sequence .
}
ORDER BY DESC(?spectral_count)
`
Let me know if merging the queries was the right way to go?
Also, Sanjay had provided a search_peptides query here: https://github.com/oceanproteinportal/api/issues/4 I'm assuming this should be updated to the one you provided above... but should that also have a VALUES included for performance?
I think that adding VALUES only works if the search is restricted to a specific dataset (graph)
This is the query for searching proteins (similar to #8 but without the filter criteria) How would this query change to search by Uniprot Id, Kegg, or PFams?