shuzhao-li-lab / asari

asari, metabolomics data preprocessing
Other
41 stars 10 forks source link

Cluster RT Alignment #97

Open jmmitc06 opened 2 months ago

jmmitc06 commented 2 months ago

Different types of samples (blanks, QC, etc.) should be aligned within a type and then the groups aligned. This can be done with clustering.

jmmitc06 commented 1 month ago

This will require expanding the RT alignment storage in the objects. currently there is only one alignment per sample. If each sample had a list of alignments, we can mix and match the alignment steps. Not sure if keeping just a list though will be sufficient.

jmmitc06 commented 1 month ago

More thoughts on refactoring this.

Right now, mass tracks exist at a sample level while composite mass tracks exist at an experiment / sub-experiment level. Thus, the methods that operate on mass tracks and composite mass tracks are actually very similar, the distinction is more organizational than logical. We can view sample mass tracks and composite mass tracks with one sample thus unifying these data structures. Then composite maps support various alignment methods that can be applied to one another, thus allowing for complex alignment strategies. This will also work with a one alignment per composite mass track paradigm. Samples are mapped to a composite mass track using one alignment, and composites are aligned to one another using another alignment, etc. Then to calculate integration regions in the original data, we just chain through all the alignments in reverse order.

For instance, in GC data all samples in a run are mapped with their RI standard to a composite mass track. We track the index to retention time and scan mappings for this level of alignment then align the various composite tracks using a second level alignment between the RI standards per composite map.