odea-project / qAlgorithms

GNU General Public License v3.0
6 stars 2 forks source link

Retentiontime bounds in output of qPeaks #7

Open LeonSaal opened 2 weeks ago

LeonSaal commented 2 weeks ago

Hi there,

when running qAlgorithms.exe ... -pp, the produced output ..._peaks.csv contains a single retention time at apex. Is there a way to also export the peak bounds in seperate columns?

Kind regards,

Leon

dahoehn commented 2 weeks ago

Hello Leon, such a feature is planned for future releases, i should have a working implementation ready next month. If you have other wishes regarding the output files, feel free to add them into this thread. Greetings, Daniel

dahoehn commented 1 week ago

I have just uploaded a new release that displays the lowest and highest retention time of a peak (only considers ones which were not interpolated). You can access this information in the peak tables generated through the -e output option. Please let me know if the changes provide what you were looking for. Kind regards, Daniel

Edit: After testing this feature a bit, i believe that the assignment of peaks to bins is occasionally incorrect. Most retention time bounds should be accurate, but i would not rely on this feature until the problem is resolved.

LeonSaal commented 1 week ago

Hi Daniel,

thanks for the new release! Unfortunately, I get an Assertion-Error:

> ..\qAlgorithms.exe -i "... .mzML" -o "." -ps -pb -pp -e
Warning: the processing log has been overwritten
printing output to: "."
reading file 1 of 1:
"... .mzML"
...  file ok

Processing positive peaks
    produced 1538791 centroids from 1448 spectra in 39643906100 ns
    assembled 16208 bins in 3023812800 ns
Assertion failed: tmpEndVal > peaks[i][j].idxPeakStart, file C:/Users/unisys/Documents/Studium/Analytik-Praktikum/qAlgorithms/src/qalgorithms_main.cpp, line 897

Kind regards,

Leon

LeonSaal commented 1 week ago

And I have a question regarding the m/z-output: Say I want to extract a chromatogram with with the results from qAlgorithms. Besides the retention time bounds I also need a m/z-range. Is $mz \pm mzUncertainty$ or $min(mz{bin})-max(mz{bin})$ what I'm looking for? If not, additional columns for the mz-range would be great!

Thanks

Leon

dahoehn commented 1 week ago

Hi Daniel,

thanks for the new release! Unfortunately, I get an Assertion-Error:

> ..\qAlgorithms.exe -i "... .mzML" -o "." -ps -pb -pp -e
Warning: the processing log has been overwritten
printing output to: "."
reading file 1 of 1:
"... .mzML"
...  file ok

Processing positive peaks
    produced 1538791 centroids from 1448 spectra in 39643906100 ns
    assembled 16208 bins in 3023812800 ns
Assertion failed: tmpEndVal > peaks[i][j].idxPeakStart, file C:/Users/unisys/Documents/Studium/Analytik-Praktikum/qAlgorithms/src/qalgorithms_main.cpp, line 897

Kind regards,

Leon

Hi Leon,

as mentioned, there is still a fundamental error in the program which we are working to resolve at this time. The generated results would be incorrect, so the program exits. This also effects all results past centroiding, so even if the program ran to completion, no peak would be reliable. I will post an additional release once this problem has been resolved.

Say I want to extract a chromatogram with with the results from qAlgorithms. Besides the retention time bounds I also need a m/z-range. Is mz +/- mzUncertainty or min(mz_bin) to max(mz_bin) what I'm looking for? If not, additional columns for the mz-range would be great!

Generally, you will want to use the given uncertainty. However, as peaks become less reliable, the accuracy of our error estimation decreases. This is due to the mz estimation using the peak apex as determined by the regression. If the regression is very inaccurate, the resulting mass and mass error become less applicable to the underlying data. While you could use the minimum and maximum mz within the regression (not the bin - the bin could have a significant mass drift and is even more error prone!), that way you lose precision scaling with the noise level and peak broadness. The effect starts showing around a DQSpeak of 0.55, the uncertainty will start decreasing at that point. Sadly, the effect is more or less dependent on the actual centroids in a peak. Even given a precise m/z and RT region, you have the problem that the m/z range is not identical for every mass spectrum in the RT range and as such cannot be accurately determined for a peak.

What we are working on to adress this problem is a means of reconstructing the process that lead to a peak and then giving you the profile data, with some margain of error. Since the current algorithm breaking problem is directly related to this, you can expect such a feature in the near future.

I hope this helps you.

Greetings, Daniel

dahoehn commented 1 week ago

I have just uploaded a new release, sidestepping the bug that caused the failed assert. If your results are swarming with DQSpeaks of -10, it will still take some time for a proper fix.

dahoehn commented 1 week ago

The release i uploaded on friday contained an infinite loop under some conditions, make sure to use the updated executeable.

LeonSaal commented 1 week ago

Hi @dahoehn,

thank you for the detailed explanation on the error and the m/z-range as well as the new release!

Kind regards,

Leon