vloux / ProteoRE

GNU General Public License v3.0
2 stars 5 forks source link

Add version of the databases #84

Closed vloux closed 6 years ago

vloux commented 6 years ago
yvandenb commented 6 years ago
NguyenLien commented 6 years ago

ID Converter The files to create reference file for ID Converter are updated each month (as in Véro email). But I don't think the output changes often. So you think how often should us update the ToolShed tool? As we also have to put the reference file in the package.

yvandenb commented 6 years ago

For neXtprot: one time per year (released in March) For Uniprot: twice a year would be greatly enough

NguyenLien commented 6 years ago

@yvandenb in the news of nextprot the 21/02/18 there is this part of information:

This release, neXtProt data release 2018-01-17, is the 2018 reference for HUPO. This release incorporates the latest PeptideAtlas release (Human 2018-01). A total of 51 PeptideAtlas data sets, which include data from cancer tissues and cell lines, with over 1.4 million peptides detected by mass spectrometry have been integrated

I haven't checked if they've already been included the information we need from Peptide atlas but here is the news.

yvandenb commented 6 years ago

"I haven't checked if they've already been included the information we need from Peptide atlas but here is the news." Indeed I received the announcement from NP last week - there are several ways to query NP either via their API using SPARQL (see https://snorql.nextprot.org/ with many examples) or via their advanced search interface using pre-encoded queries (see for instance: https://www.nextprot.org/proteins/search?mode=advanced&queryId=NXQ_00226) For instance this one could do the job with regard to PeptideAtlas

Proteins with at least 2 validating peptides >=9aa found in blood plasma, urine or cerebrospinal fluid (criteria for biomarker)].

select distinct ?entry where { values ?pepsources { source:PeptideAtlas_human_Cerebrospinal_Fluid source:PeptideAtlas_human_Blood_Plasma source:PeptideAtlas_human_Urine } ?entry :isoform ?iso. ?iso :peptideMapping ?pm . ?pm :evidence / :assignedBy ?pepsources . ?pm :proteotypic true . ?pm :start ?p1 ; :end ?p2 . filter(?p2-?p1 >= 8) # peptide length >= 9 } group by ?entry having(count (distinct ?pm) > 1) # at least two such peptides

This is why this should be discussed together on the basis of local test...opinion ?

NguyenLien commented 6 years ago

If the Peptide Atlas database can now be queried from Nextprot, surely it could be a good way to retrieve the information in case the flat file is too big or need to update often and so on.

yvandenb commented 6 years ago

I agree ! But What we have to check by querying NP is :

  1. whether all information we need are actually present (e.g. nb of peptides observed by mass spec for each protein in a given tissue ; if peptide have been observed, which they are in terms of sequences and detectability predicted score)
  2. if yes, how to build the query (much more simple as we can ask to NP ! ;-))
NguyenLien commented 6 years ago

I close this issue as the version is added to the components. Nextprot and Peptide atlas to be discuss in issue #90.