statisticalbiotechnology / quandenser

QUANtification by Distillation for ENhanced Signals with Error Regulation
Apache License 2.0
9 stars 1 forks source link

Linking output to results from other tools #11

Closed andrewjmc closed 4 years ago

andrewjmc commented 4 years ago

Hello,

Thanks for all your help so far. I now have pilot feature groups for 52 samples (consensus spectra awaited) and am analysing.

Ideally, I would now like to link with MaxQuant identifications, so confident human features can be ignored.

The MaxQuant evidence.txt table gives MS/MS IDs which can be linked with the msms.txt table to give Scan numbers ("the RAW-file derived scan number of the MS/MS spectrum") and Scan indices ("the consecutive index of the MS/MS spectrum").

Can you give any pointers as to how I can use the final column of the feature groups file and the MaRaCluster.clusters_p10 (whether from maracluster or maracluster_extra_features directory) file (if possible) to work out the correspondence?

The first thing I need to understand is which value from the mzML is used as the scan index in column 2 of the Maracluster clusters:

<spectrum **index="0"** id="controllerType=0 controllerNumber=1 **scan=1**" defaultArrayLength="381">
          ...
          <cvParam cvRef="MS" accession="MS:1000796" name="spectrum title" value="Febrile4532.1.1. File:&quot;Febrile4532.raw&quot;, NativeID:&quot;controllerType=0 controllerNumber=1 **scan=1**&quot;"/>

Will it be the spectrum index (0) or the scan (1)?

Secondly, I need to understand how to link the indices from the final column of feature groups to the MaRaCluster cluster number. It looks like the -S flag is set (https://github.com/statisticalbiotechnology/maracluster/wiki/FAQ).

Thanks for your advice,

Andrew

andrewjmc commented 4 years ago

I may have just answered my questions, in which case this is here for anyone else who needs it!

(1) The maracluster scan numbers from an mzML with "scan=N" titles should be N. (2) The MS2 indices given in the final column of the feature groups file is in the format described here:

https://github.com/statisticalbiotechnology/maracluster/wiki/FAQ

if the -S flag is set and/or the mgf output format is used, it is this same one-based index as above, but multiplied by 100 plus an one-based precursor identifier. E.g. SCANS=2303 means it is the 3rd precursor of the 23rd cluster.

So the floor of index/100 should give the offset-1 cluster index in the maracluster file.

I'll leave this open in case the authors have anything to add before closing.

andrewjmc commented 4 years ago

Having worked through this it all works fine, and I can link with MaxQuant data.