oceanproteinportal / api

A Python API module for the Ocean Protein Portal
1 stars 0 forks source link

Peptides missing from Query? #7

Closed kaimikacolin closed 1 year ago

kaimikacolin commented 2 years ago

@jaclynsaunders This query provides a "peptideCount" but no peptides... is there any way this can be included per-result as an array?

https://kg.oceanproteinportal.org/?query=+SELECT+DISTINCT+%3Fprotein+%3Fproduct_name+%3Fdataset+%3Fdataset_name+%3Fexpedition+%3Fspectral_count+%3Fprotein_id+%3Fpeptide_count+%3Fsequence+%3Fids%0D%0A++++++++WHERE+%7B+%0D%0A++++++++%3Fprotein+a+%3Chttp%3A%2F%2Fschema.oceanproteinportal.org%2Fv2%2Fviews%2FProteinFound%3E+.%0D%0A++++++++%3Fprotein+%3Fp+%3Fo+.%0D%0A++++++++FILTER+REGEX%28%3Fo%2C+%22FE%22%2C+%22i%22%29%0D%0A++++++++%0D%0A++++++++%3Fprotein+view%3AproductName+%3Fproduct_name+.%0D%0A++++++++%3Fprotein+view%3Aidentifiers+%3Fids+.%0D%0A++++++++%3Fprotein+view%3AdatasetName+%3Fdataset_name+.%0D%0A++++++++%3Fprotein+view%3Adataset+%3Fdataset+.%0D%0A++++++++%3Fprotein+view%3Aexpedition+%3Fexpedition+.%0D%0A++++++++%3Fprotein+view%3AspectralCount+%3Fspectral_count+.%0D%0A++++++++%3Fprotein+view%3AproteinID+%3Fprotein_id+.%0D%0A++++++++%3Fprotein+view%3ApeptideCount+%3Fpeptide_count+.%0D%0A++++++++%3Fprotein+view%3Asequence+%3Fsequence+.%0D%0A++++++++%7D%0D%0A++++++++ORDER+BY+DESC%28%3Fspectral_count%29%0D%0A&graph=urn%3Aopp%3Aviews%3Anunn-bering-sea

jaclynsaunders commented 2 years ago

@kaimikacolin - do you want basically all the peptides found within a protein along with the spectral counts for those peptides? (I'm also noticing an issue here with the data that we'll need to dig into.)

For example this protein has spectral count of 212 but peptide count of 0?

| "Succinate dehydrogenase flavoprotein subunit"^^xsd:string | | "Arctic-Bering Sea (Nunn)"^^xsd:string | "HLY1301"^^xsd:string | 212.0 | "MOCAT.samples_revised_C16119679_gene1_3"^^xsd:token | 0

Let me get back to you on this one, Colin. Thanks!

kaimikacolin commented 2 years ago

The ES engine was providing peptides like so: 'peptideSequence': ['GFYDDNYTTSPEK', 'GATDLQAADQEITAVYAQLLHK'] , so I don't think spectral counts are necessary.

Those values are then sent to this view: image

kaimikacolin commented 2 years ago

Marking this one "medium" priority

jaclynsaunders commented 2 years ago

Want to return an array of peptide sequence strings per protein. Return as JSON?

kaimikacolin commented 2 years ago

Yes a JSON object would be fine

jaclynsaunders commented 2 years ago

@kaimikacolin - Here's a query that can plug into the other protein view queries. Here's a working example of finding just the peptides per protein - returned as a string with comma-separated values for each peptide.

SELECT DISTINCT ?protein_name (GROUP_CONCAT(DISTINCT(?peptide_sequence), ',') AS ?peptide_list)
WHERE { 
?protein view:viewOf ?protein_id .
?protein_id opp:proteinIdentifier ?protein_name .
?pep_b opp:forProteinIdentification ?protein_id .
?peptide_id opp:suggestsProtein ?pep_b .
?peptide_id opp:describesPeptide ?pep_id .
?pep_id opp:peptideSequence ?peptide_sequence .
}
GROUP BY ?protein_name
jaclynsaunders commented 2 years ago

@kaimikacolin - Here's with the peptide list coming from the main query. It would probably be best to have it as a view, but this query works for now.

Here it is as a query with filters on depth & date with regex. However, still stuck on filtering out samples where spectral counts = 0