oceanproteinportal / api

A Python API module for the Ocean Protein Portal
1 stars 0 forks source link

Protein Search TYPES #9

Open kaimikacolin opened 2 years ago

kaimikacolin commented 2 years ago
SELECT DISTINCT ?protein ?product_name ?dataset ?dataset_name ?expedition ?spectral_count ?protein_id peptide_count ?sequence ?ids
        WHERE { 
        ?protein a <http://schema.oceanproteinportal.org/v2/views/ProteinFound> .
        ?protein ?p ?o .
        FILTER REGEX(?o, "asd", "i")
        ?protein view:productName ?product_name .
        ?protein view:identifiers ?ids .
        ?protein view:datasetName ?dataset_name .
        ?protein view:dataset ?dataset .
        ?protein view:expedition ?expedition .
        ?protein view:spectralCount ?spectral_count .
        ?protein view:proteinID ?protein_id .
        ?protein view:peptideCount ?peptide_count .
        ?protein view:sequence ?sequence .
        }
        ORDER BY DESC(?spectral_count)

This is the query for searching proteins (similar to #8 but without the filter criteria) How would this query change to search by Uniprot Id, Kegg, or PFams?

kaimikacolin commented 2 years ago

Marking this one "High"

ashepherd commented 2 years ago

I will develop a template query for each of the 5 search value queries from the UI

ashepherd commented 2 years ago

REGEX Searches:

SELECT DISTINCT ?protein ?product_name ?dataset ?dataset_name ?expedition ?spectral_count ?protein_id ?peptide_count ?sequence ?ids
WHERE { 
        ?protein a <http://schema.oceanproteinportal.org/v2/views/ProteinFound> .

        # if search term: protein name (ex. 'Fe')
        ?protein view:productName ?o .
        # else if search term: Sequence (ex. 'AA')
        ?protein view:sequence ?o

        FILTER REGEX(?o, "{search value from UI}", "i")

        ?protein view:productName ?product_name .
        ?protein view:identifiers ?ids .
        ?protein view:datasetName ?dataset_name .
        ?protein view:dataset ?dataset .
        ?protein view:expedition ?expedition .
        ?protein view:spectralCount ?spectral_count .
        ?protein view:proteinID ?protein_id .
        ?protein view:peptideCount ?peptide_count .
        ?protein view:sequence ?sequence .
}
ORDER BY DESC(?spectral_count)

Identifier Search:

SELECT DISTINCT ?protein ?product_name ?dataset ?dataset_name ?expedition ?spectral_count ?protein_id ?peptide_count ?sequence ?ids
WHERE { 

        # if search term: Kegg (ex. 'K03404')
        ?id a opp:KeggOrthologyIdentifier .
        # else if search term: Uniprot ID (ex. 'G4E3S1')
        ?id a opp:UniprotProteinIdentifier .
        # else if search term: PFams (ex. 'PF04117')
        ?id a opp:PFamsIdentifier .

        ?id opp:identifierValue ?o .
        FILTER REGEX(?o, "{search value from UI}", "i") .

        ?pid opp:identifier ?id .
        ?protein view:viewOf ?pid .
        ?protein a <http://schema.oceanproteinportal.org/v2/views/ProteinFound> .
        ?protein view:productName ?product_name .
        ?protein view:identifiers ?ids .
        ?protein view:datasetName ?dataset_name .
        ?protein view:dataset ?dataset .
        ?protein view:expedition ?expedition .
        ?protein view:spectralCount ?spectral_count .
        ?protein view:proteinID ?protein_id .
        ?protein view:peptideCount ?peptide_count .
        ?protein view:sequence ?sequence .
}
ORDER BY DESC(?spectral_count)
ashepherd commented 2 years ago

To filter either of these query down to a specific dataset, add the lines:

ex: Metzyme 3.0

?protein view:dataset <urn:opp:dataset:metzyme-3.0>
kaimikacolin commented 2 years ago

@ashepherd I just added the queries for UniprotID, Pfams, and Kegg, and they are looking great. I've combined these with the VALUES directive that sped up the query mentioned in another ticket:

` SELECT DISTINCT ?protein ?product_name ?dataset ?dataset_name ?expedition ?spectral_count ?protein_id ?peptide_count ?sequence ?ids WHERE { VALUES ?p { view:productName view:identifiers view:expedition view:proteinID view:sequence } ?protein a http://schema.oceanproteinportal.org/v2/views/ProteinFound .

    FILTER REGEX(?o, "K03404", "i")

        ?id a opp:KeggOrthologyIdentifier .
        ?id opp:identifierValue ?o .
        ?pid opp:identifier ?id .
        ?protein view:viewOf ?pid .

    ?protein view:productName ?product_name .
    ?protein view:identifiers ?ids .
    ?protein view:datasetName ?dataset_name .
    ?protein view:dataset ?dataset .
    ?protein view:expedition ?expedition .
    ?protein view:spectralCount ?spectral_count .
    ?protein view:proteinID ?protein_id .
    ?protein view:peptideCount ?peptide_count .
    ?protein view:sequence ?sequence .
    }
    ORDER BY DESC(?spectral_count)

`

Let me know if merging the queries was the right way to go?

kaimikacolin commented 2 years ago

Also, Sanjay had provided a search_peptides query here: https://github.com/oceanproteinportal/api/issues/4 I'm assuming this should be updated to the one you provided above... but should that also have a VALUES included for performance?

jaclynsaunders commented 2 years ago

I think that adding VALUES only works if the search is restricted to a specific dataset (graph)