Closed yvandenb closed 6 years ago
@yvandenb It's not complicated for extracting the info from source files (home-made by Yv). But based on your query to NP in #84, we can get the entry for each ID so it can avoid to download the whole Peptide Atlas. But I haven't understood how to extract the information from the result entry. Do you want me to first build a component based on your home-made source files then investigate in NP query, or to directly investigate in NP query?
A very good question that you raised Lien...Btw, I had a discussion about this matter with Lydie Lane (NP's PI) on last Monday; obviously it would be easier and advantageous to work using information from NP for many reasons: data curated, high content, data richness, advanced query using SPARQL via API...and a very good relationship ! This is actually what we did with Lisa when she prototyped the "Protein features" and still of interest for updating NP info we needed; BUT in the case of MS-based information needed for the UC2 (i.e. "nbr of psm observed" in what tissue (in fact "build)), Lydie confirmed that NP does not integrated these info in their RDF model - This is why we still need to consider info from PA, and the most simple way to retrieve it - I sent yesterday a msg to PA manager and got an answer (that I'am going to forward you) - Thus, at the moment, my suggestion would be to first build a tool based on my home-made source files...
The first version of this component is now available in dev instance !
Let's have a look :+1:
Btw, find below mails I had with the staff of Peptide Atlas
Hi Yves, What you can do is do the query for each tissue type you are interested in. Below link is for Brain.
The brain is specified as sample_category_id=2 in the link. You can get full list of sample_category_id here:
https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/ManageTable.cgi?TABLE_NAME=AT_sample_category
Zhi -----Original Message----- From: VANDENBROUCK Yves 206108 [mailto:yves.vandenbrouck@cea.fr] Sent: Wednesday, March 14, 2018 9:38 AM To: Zhi Sun Cc: Eric Deutsch Subject: RE: Human PeptideAtlas download
Dear Zhi, Dear Eric,
Thank you for your answer; so, I'd need to retrieve ms-based information related to a list of human proteins such as the "nbr of psm observed" in a given tissue/sample of interest...I actually did it by parsing available .xml files corresponding to older builds (as reported), and I now would like to update these info using the most recent version of the human PA build...not sure it would be feasible via the query interface in a batch mode, wouldn't it? Regards, Yves
Yves Vandenbrouck, PhD Etude de la Dynamique des Protéomes (EDyP) Laboratoire Biologie à Grande Echelle (BGE) U1038 INSERM/CEA/UGA Biosciences and Biotechnology Institute of Grenoble (BIG) CEA/Grenoble -----Message d'origine----- De : Zhi Sun [mailto:zsun@systemsbiology.org] Envoyé : mercredi 14 mars 2018 17:22 À : VANDENBROUCK Yves 206108 yves.vandenbrouck@cea.fr Cc : Eric Deutsch edeutsch@systemsbiology.org Objet : RE: Human PeptideAtlas download
Hi Yves, The xml file is not generated. Can you let me know what you need? Maybe we can get information through PeptideAtlas query interface.
Thanks, Zhi -----Original Message----- From: Yves VANDENBROUCK yves.vandenbrouck@cea.fr Dear colleagues, I tried to download the latest version of the Human build (Jan 2018 - XML file) via this web page: http://www.peptideatlas.org/builds/ and was redirected to this web page http://www.peptideatlas.org/builds/human/201712/atlas_build_472.xml.gz woth the following error msg: "Not Found The requested URL /builds/human/201712/atlas_build_472.xml.gz was not found on this server." Please could you help me with that and provide me with the right link?
Ok Lien , this new tool works fine ! bravo..just two points now need to be improved:
[ ] Following my exchanges with the PA staff, we have to update the source files (that are a bit different from those you used for this 1rst version and consequently that is going to impact the submission form see point 2 below) - As we now agreed on what we are manipulating in terms of PA data (and how to retrive them via a PA query) I created a new issue describing the procedure to create them #93
[ ] submission form for this tool needs few enhancement: just to keep tracks: precise the ID type required (Uniprot Accession number) and the user doc section (assigned to me as usual ;-))
User doc for: "Retrieve MS-based information at the peptide level add MS-based annotation to your protein list from Peptide Atlas" New title => "Retrieve MS-based information at the peptide level (from Peptide Atlas)" Given a list of Uniprot accession number the tool allows to retrieve MS-based information for each peptide identified for a given protein. Could be of interest for people who wish to select peptides for further targeted MS-based experiments (i.e. if the protein is detectable in the sample, it will be detected via that peptide).
Input required: A list of Uniprot accession number (e.g. Q12860) provided either in the form of a file (if you choose a file, it is necessary to specify the column where are your Uniprot accession number) or in a copy/paste mode. If your input file or list contains other type of IDs, please use the ID_Converter tool to convert yours into Uniprot accession number. Output: An output is returned for each selected proteomics sample (indicated by the name of the output in the history panel) containing the list of peptides identified for each protein requested with the following additional information:
Data were retrieved from Peptide Atlas release (Jan 2018)
next "user doc" (protein-level) coming soon ;-)
User doc: Retrieve MS-based information at the protein level add MS-based annotation to your protein list from Peptide Atlas New title => Number of MS/MS observations in sample (from Peptide Atlas) Given a list of Uniprot accession number this tool indicates the number of times a protein has(ve) been observed in a given sample using LC-MS/MS proteomics approach. Could be of interest for people who wants to know to what extent a protein is detectable (and to roughly estimate its level) in a given sample using proteomics. Available human biological samples are the following: brain, heart, kidney, liver, plasma, urine and cerebrospinal fluid (CSF). Data were retrieved from Peptide Atlas release (Jan 2018).
Input required: A list of Uniprot accession number (e.g. Q12860) provided either in the form of a file (if you choose a file, it is necessary to specify the column where are your Uniprot accession number) or in a copy/paste mode. If your input file or list contains other type of IDs, please use the ID_Converter tool to convert yours into Uniprot accession number. Output: Additional columns are created for each selected proteomics sample reporting the number of times all peptides corresponding to a protein have been observed by LC-MS/MS according to Peptide Atlas. “NA” means that no information has been reported suggesting that this protein has not been observed in the sample of interest.
Done !
Specification: Goal: to allow end-users to add MS-based info (from PeptideAtlas) to their protein list & to check whether or not their protein have been experimentally observed in a given human tissue/sample at the protein level
For each table (PA source file in tab format depostied in bioproj), the information to be used (for now) are: Col. A: The Uniprot Accession number (“biosequence_name” in the PA source file) Col. F: an integer (“n_observations” in the PA source file)
Submission form: • Input • Copy/paste protein ID (Uniprot accession number) or a tabular file (with a column number option indicating Uniprot accession number required as IDs + header yes/no) • Options* (Select proteomics dataset (sample) (name below organized using a Radio-button menu)
User doc section: will follow... For any further details, feel free to call me
N.B. : source file from Peptide Atlas (PA) are usually in the form of xml file called "build" to which an id is assigned (build_id) - see http://www.peptideatlas.org/builds/ for a complete picture of what is available. As each xml file is (very) large, the current idea would be to either post-process the xml once downloaded from PeptideAtlas or retrieve info using a query via the API of nextprot (which also gather info form PeptideAtlas) - I suggest to discuss this aspect afterwards as the only thing we need at the moment, is to prototype the behavior and the GUI to better figure out what should be improved with the Use Case 2 (see issue #84)