statisticalbiotechnology / maracluster

Matthew The's implementation of MaRaCluster
Apache License 2.0
11 stars 3 forks source link

Problem parsing scan numbers in mgf files #7

Closed MatthewThe closed 7 years ago

MatthewThe commented 8 years ago

Mgf files work but it seems Maracluster does not read the scan numbers, and instead outputs some index number. Below is an example mgf spectrum.

BEGIN IONS
TITLE=File3233 Spectrum1 scans: 955
PEPMASS=419.31415 142643.59375
CHARGE=2+
RTINSECONDS=1519
SCANS=955
100.78580 389.208
121.02799 1375.67
141.32939 348.489
148.97514 371.168
149.02254 80548.1
167.03293 3469.45
209.89798 363.966
248.89998 505.242
305.88007 480.797
322.83035 1847.01
326.83221 422
378.72043 539.584
378.73157 472.819
379.72842 2027.64
396.68686 1624.04
396.85779 606.745
414.69812 3246.26
418.99304 1784.92
740.50848 427.397
END IONS

Proteowizard seems to store the SCANS field in a variable which only exists for mgf files. We could parse this variable before checking the id field (as this gets set to index=<index_number> by proteowizard upon reading the mgf file).