Closed yxwang97 closed 2 years ago
The hdf5 files contain the data we used for training/validation/testing. In general, we used the top 3 highest scoring spectra per peptide, modification, charge, mass analyzer and collision energy setting for this. The identification data was extracted from MaxQuant (msms.txt), but the intensities were extracted from the raw data itself, since the msms.txt files contain de-charged and de-isotoped intensities. So there is somewhat of a relationship between the intensities on most, but not all fragment peaks. The hdf5 files contain only straight y- and b-fragment ions (no neutral losses or other things) up to charge 3 - again some difference to the msms.txt as this contains fragment peaks with neutral losses. In addition, we used a score cutoff of 70 - if I recall correctly, so some peptide identifications you will find the msms.txt will not be present in the hdf5 files. The intensities in the hdf5 files are scaled to the most intense fragment, thus 0-1.
Hello,I downloaded the information you provided .hdf5 file, which has a column of intensities_raw, their values appear to be between 0 and 1, which is similar to the msms.txt in the zip file downloaded from PRIDE website. What is the relationship between intensities in msms.txt file?