shuzhao-li-lab / asari

asari, metabolomics data preprocessing
Other
38 stars 9 forks source link

How to pick reference sample for alignment ? #58

Closed amnahsiddiqa closed 1 year ago

amnahsiddiqa commented 1 year ago

For regular (underivitized) metabolomics data set that I analyzed recently (attached some qc), a blank sample is picked up as reference sample. It would be helpful if we could use say like have top ten samples with highest landmarks peaks to choose from, our next reference or in some way skip the samples that user don't want to use as reference sample like blanks. rpneg_share.zip

jmmitc06 commented 1 year ago

Thanks, that's a good suggestion. I think that we need two enhancements here:

The first is to display the list of possible reference samples using the existing criteria (num landmark peaks). The second is an option to provide a list of files that cannot be used as the reference. This should allow automated and semi-automated processing. These are both on my to do list.

Big picture question though is why do blanks get selected for alignment and does it result in issues downstream? Aligning on the blanks could be aligning samples using common background ions which may still yield a good alignment.

amnahsiddiqa commented 1 year ago

@jmmitc06 : Yes as we discussed before that if blank is being picked it definitely have highest landmark peaks which should be able to be resolved but then I have not gone in details why there is trouble in rt alignment(I will do some query plots later in day), but I have appended quick regular qc results which definitely demonstrate that there could be problems in retention time alignment.

Also it would be nice to have a label "feature request". Since, you can keep feature enhancement for your own records but other people may request it, which you can approve or not.

jmmitc06 commented 1 year ago

This has been implemented as of: [implement_join 846133a].

By default, the old behavior occurs where the sample with the largest number of anchor pairs is selected as the reference without user input. If you now pass the parameter '-f manual' you will get a terminal menu where you will have a list of sample names and the number of anchors. You can then select which one to use. You can still specify a sample by name using '-f ' as previously too.

The way to specify the selection in the menu is pretty ugly but did not require any additional dependencies. This will be cleaner in the GUI. Currently the entire list of samples is printed to the terminal which may be problematic.

This is not in the main branch yet, but will be pushed there soonish.

jmmitc06 commented 1 year ago

I'm going to close this issue if this seems like a good resolution for you?

amnahsiddiqa commented 1 year ago

Thanks that works.