michalsta / opentims

Open-source C++ and Python module for opening binary timsTOF data files.
Other
41 stars 11 forks source link

In search of the precursorMZ values for MS2 #16

Open chufz opened 1 year ago

chufz commented 1 year ago

Hi, I am in search of a slot for the precursorMZ values for MS2 data, however, I could not find it. Is there any change to read out these values for the scans? Would be a piece of necessary information for any spectral library application.

Best, Carolin

MatteoLacki commented 1 year ago

Good day there,

I understand correctly that you are working with DDA data?

Best,

chufz commented 1 year ago

Hi Matteo,

yes it is PASEF DDA data, and i will need to read out the precursor Masses. See also issue https://github.com/rformassspectrometry/MsBackendTimsTof/issues/18

Would be happy to have this implemented in opentims

MatteoLacki commented 1 year ago

OK, but I have to contact my collaborator: I don't know now from which language you want to access this information and how exactly you want the output format to look like. In fact, this looks like a pretty simple thing to code oneself in Python to exactly meet your needs: do you need some help with that? If so, how would you like your output to look like exactly?

MatteoLacki commented 1 year ago

I have now some DDA data to play with and am in contact with people from Bruker to ask them if I get the format right. Stay tune.

chufz commented 1 year ago

Thanks a lot, Matteo :)

MatteoLacki commented 1 year ago

I think it is not so simple.

The problem is, that the PasefFrameMsMsInfo describes only the positions of the fragments in the raw data. The precursors that these correspond to have been fragmented. Obviously, there is some sort of expectation that a frame or two before the fragmentation was triggered, the MS1 precursor data should have been acquired and analyzed by the instrument, as this is the underlying principle of the Online Parallel Accumulation-Serial Fragmentation. But the fragmentation scheduling algorithm is apparently more complicated than that.

           MS2Frame  ScanNumBegin  ScanNumEnd
Precursor
1             66           770         796
1             67           770         796
1             68           770         796
1             70           770         796 # See, there was a hole in one precursor.
1             71           770         796
1             72           770         796
1             73           770         796
2             98           814         840 # Here there was some break for switching or something
2             99           814         840
2            100           814         840```

I could imagine that you would be interested not simply few frames with MS1 data, but, likely, all of them in the current dataset. This would call for some form of clustering, which is likely performed by MaxQuant and Co.

I will organize one more meeting to clear this up with the Bruker guys.

MatteoLacki commented 1 year ago

I mean, the main problem is, that without clustering I do not know how many frames should I extract from the MS1 signals to give the best possible answer to the question about the identity of precursors that were fragmented. It could have been that when the signal was still rising, the algorithm on board of the instrument did not schedule data for fragmentation. Likely, when it was going down, this could have been again the case. Also, the answer would vary even more in presence of coeluting ions that would make it more interesting for the algorithm to schedule their fragmentation (marvelous DDA at its full capacity). So, the big question to you is: if you are happy with any estimates of MS1 precursors, then you already have them in the table (you can translate frames to retention times with .frame_to_retention_time method and scan to inverse ion mobility by .scan_to_inv_ion_mobility methods of the OpenTIMS object). These are merely statistics, but correct ones and obviously these precursor signals cannot be observed as they were fragmented. If you want more, you should include raw data from the neighbourgood, and for that one needs either clustering, or at very least extraction of the very close-by sections of data.

timosachsenberg commented 10 months ago

If one would like to replicate the conversion by e.g., msconvert how would you proceed? E.g., a filter like: https://user-images.githubusercontent.com/5803621/155004773-b72aac33-107c-4546-aeca-4d2fe9f7424e.png that generates approximate precursor values (including m/z)

MatteoLacki commented 10 months ago

I don't think I know what MSConvertGui does. Where can we find it?

timosachsenberg commented 10 months ago

MSConvert is available here: https://proteowizard.sourceforge.io/download.html but I did not find a good documentation for that part. Browsing the source code indicates that there might be some pointers: https://github.com/search?q=repo%3AProteoWizard%2Fpwiz+scanSumming&type=code https://github.com/ProteoWizard/pwiz/blob/55889be8e5f48ba44640bf0d93f00be3f4b0824a/pwiz_aux/msrc/utility/vendor_api/Bruker/timsdata_cpp_pwiz.h

MatteoLacki commented 10 months ago

I think I am a bit lost:

you are writing on a post about precursor position for MS2 fragments. Does this filter have anything to do with it?

andzajan commented 10 months ago

Precursor summing has been broken in msconvert for a while now and no one did replay to issue I did report: https://github.com/ProteoWizard/pwiz/issues/2566.

@MatteoLacki, if you want to implement spectra summing of the "same" precursor, then the latest Bruker API does include methods for extract spectrum across all frames for the same Precursor ID. Or there is method to get "quasi-profile" spectrum, which will return you intensities of MS/MS peaks on fixed m/z grid and spectral summing can be done externally.

But if I understand @timosachsenberg question correctly, then they don't want summed spectrum, but just "aggregated" precursor information, in that case it can all be done from information present in sqlite tables.

timosachsenberg commented 10 months ago

Thanks for the insight. I have to admit that I still need to catch up on the detail - for my MS/MS identification use case, the simple summing was sufficient.