shuzhao-li-lab / asari

asari, metabolomics data preprocessing
Other
38 stars 9 forks source link

Implement Join Function/Workflow #57

Closed jmmitc06 closed 1 year ago

jmmitc06 commented 1 year ago

Join should take results from two asari runs on similar data and yield a single feature table. This will require alignment of the two datasets. Anticipating getting a dataset for testing / developing this.

jmmitc06 commented 1 year ago

The branch 'implement_join' has a script that is a prototype of the join command. Currently it is stand alone but will be integrated into the program properly in the near future.

Given two feature tables T1 and T2, the algorithm maps T1 onto T2 as follows. First every feature in T1, F1, is checked for uniqueness in T1 using a 5ppm m/z window and a 5% relative retention time error. Unique features in T1 are then checked against the features, F2, in T2 using the same uniqueness criteria. All F1, F2 pairs are then used with LOWESS to generate a rt1 to rt2 mapping function. Unlike base asari whcih has hardcoded params for the regression, the fraction parameter is decreased until the lowest error solution is generated that has no discontinuties across all of rt space. Then the predicted rt2 of every feature in T1 is calculated and if the predicted rt2 of a given F1 is within the rtime_left, rtime_right of an F2 with a similar m/z value (again a 5ppm cutoff) F1 and F2 are considered to be the same feature.

F1s and F2s that fail to get mapped are treated as singletons.

jmmitc06 commented 1 year ago

This is now implemented as of [implement_join 44685b3]. The solution is very rough currently and does require that we have the pickles from the samples available. In essence, we map the cmaps of two experiments (A, B) to one another. This yields an mz_A to mz_B and rt_A to rt_B mapping. We can then map "through" each cmap, back to the original samples and extract the feature intensities. Right now we cannot join multiple samples or joined samples to one another but that will be fixed quickly.

jmmitc06 commented 1 year ago

We can now merge N experiments together as of [implement_join fe2e2a2].

Currently cannot merge previously merged experiments and the functionality is not "baked in" to the objects but that is in progress.

jmmitc06 commented 1 year ago

This is completed now as of [implement_join 3dc7ebe] will PR to master soonish but will close this issue.