kusterlab / prosit

Prosit offers high quality MS2 predicted spectra for any organism and protease as well as iRT prediction. When using Prosit is helpful for your research, please cite "Gessulat, Schmidt et al. 2019" DOI 10.1038/s41592-019-0426-7
https://www.proteomicsdb.org/prosit/
Apache License 2.0
85 stars 45 forks source link

intensities_raw in .hdf5 file is different from msms.txt in .zip file #77

Closed yxwang97 closed 2 years ago

yxwang97 commented 2 years ago

Hello,I downloaded the information you provided .hdf5 file, which has a column of intensities_raw, their values appear to be between 0 and 1, which is similar to the msms.txt in the zip file downloaded from PRIDE website. What is the relationship between intensities in msms.txt file?

WassimG commented 2 years ago

The hdf5 files contain the data we used for training/validation/testing. In general, we used the top 3 highest scoring spectra per peptide, modification, charge, mass analyzer and collision energy setting for this. The identification data was extracted from MaxQuant (msms.txt), but the intensities were extracted from the raw data itself, since the msms.txt files contain de-charged and de-isotoped intensities. So there is somewhat of a relationship between the intensities on most, but not all fragment peaks. The hdf5 files contain only straight y- and b-fragment ions (no neutral losses or other things) up to charge 3 - again some difference to the msms.txt as this contains fragment peaks with neutral losses. In addition, we used a score cutoff of 70 - if I recall correctly, so some peptide identifications you will find the msms.txt will not be present in the hdf5 files. The intensities in the hdf5 files are scaled to the most intense fragment, thus 0-1.