Retention Time Alignment Poorly Regularized Under Some Conditions

jmmitc06 commented 1 year ago

Here is the retention time alignment plot from a recent experiment (RP - pellets, Morphic):

And another with HILIC+ pellets:

Most of the zig-zaggy chaotic regression lines are blanks, but it occassionally with 'real' samples as well. Suspect peak density is a factor here. Unclear if this has a major impact on results or not, especially with filtering.

jmmitc06 commented 1 year ago

Some initial thoughts:

the anchoring at the beginning and end of the chromatograms does prevent "blowing up" but also a more regularized model should not have this problem anyways.
I'm surprised we get the zig-zag behavior with lowess, need to find some examples and investigate further. Maybe some sets of peaks are shifted more severely while the overall chromatogram is shifted less. If that occurs, there is not an easy fix.
I think it is unlikely but maybe the rules for selecting anchors are not strict enough and we are picking up noise sometimes and aligning junk peaks in some samples.
Currently the criteria for selecting reference peaks considers m-selectivity but I did not see an explicit check on c-selectivity. Maybe we need both.

@amnahsiddiqa, if you have ideas for better options than LOWESS, I would like to discuss with you. It would be my 'go-to' method as well but maybe if we put our heads together we can think of something.

shuzhao-li commented 1 year ago

Were the plots based on chromatograms.rt_lowess_calibration_debug or different code?

The RT mapping dictionaries only record values that differ btw two samples. I think rt_lowess_calibration_debug does the right plot.

jmmitc06 commented 1 year ago

These plots were generated using asari dashboard.

I can run it in debug mode and see if the same behavior occurs.

jmmitc06 commented 1 year ago

This issue, while not completely solved, has been partially mitigated by adding a threshold on max RT delta between peaks for alignment and by adding the option to do multiple lowess iterations. The first prevents severe outliers while multiple iterations does improve regularization.

Alignment of blanks remains problematic but likely has no easy solution at this time.

When misalignments occur, the solution so far has been to select a reference manually. Future efforts could seek to improve reference selection. A better algorithm, such as RANSAC, may be a future option also.

shuzhao-li-lab / asari

Retention Time Alignment Poorly Regularized Under Some Conditions #49