yezhengSTAT / ADTnorm

ADTnorm normalizes the cell surface protein measurement of CITE-seq data, facilitating across batches and across studies data integration.
https://yezhengstat.github.io/ADTnorm/articles/ADTnorm-tutorial.html
GNU General Public License v3.0
19 stars 4 forks source link

missing value where TRUE/FALSE needed #6

Closed yi6kim closed 2 weeks ago

yi6kim commented 1 year ago

Dear Ye Zheng,

First of all, thank you for creating this amazing package. I have ADT data from single cell DNA-antibody sequencing experiments (rather than CITE-seq), which I want to (1) correct the batch effects across timepoints (within same patient) and (2) correct for the mouse background signals (the IgG's) for better accuracy.

I have two questions: (1) I am keep getting the error message

Error in if (length(y_valley[x_valley > real_peak[1]]) == 0 || (y_valley[x_valley > : missing value where TRUE/FALSE needed

after I run the following: cell_x_adt_norm <- ADTnorm( cell_x_adt = my_adt, cell_x_feature = my_feat, save_outpath = save_outpath, study_name = run_name2, marker_to_process = c("CD3", "CD4", "CD8"), save_intermediate_fig = TRUE )

Where my_adt is a matrix of 48 antibodies (including three IgG isotypes), and my_feat is a matrix where 'sample', 'batch', 'study_names' are the timepoints. Since I didn't have all the information in the demo data 'cell_x_feature' given, I had to arbitrarily make up some of the variables in my_feat.

(2) Since we have the exact IgG reads as a part of our ADT data, I was thinking if it is compatible to use dsb (denoised and scaled by background) and then use ADTnorm on the same dataset (so the data fed into ADTnorm is already normalized). I'm not too familiar with the math behind it, so if this practice sounds not recommended, please let me know!

yezhengSTAT commented 1 year ago

Sorry for the delay in replying! I am in the middle of a conference.

I did see the error message from other users as well, and I will update this part to provide more informative error or warning feedback whenever it is triggered. By far, please set "shoulder_valley = TRUE" and see if the error can be py-passed.

As for the IgG part, if I understand correctly, your target is to separate the protein enrichment part (signal part) from the antibody unspecific binding part (background part). If so, you may use the IgG density distribution to set the threshold, below which are the background peak and above are cells with real signals of a certain protein. ADTnorm and DSB are both normalization methods. Therefore, they should be run on their own. You can try both and go with the one that gives you the most reasonable results. ;)

yi6kim commented 1 year ago

No worries, thank you so much for your reply!

The error message disappeared when I added "shoulder_valley = TRUE"! (If you don't mind, may I ask for a brief explanation on what this does?)

For IgG, yes, my goal is to separate and discard the background fluorescence from IgG. (It's great to know that ADTnorm can also "denoise" the mouse background!)

Ideally, I would like to subtract the exact IgG reads, but so far with ADTnorm I only could find the way to set such thresholds as preset values (e.g. the following parameters from the documentation only lets setting certain constants globally, applied to the entire dataset). And I'm guessing, by setting the 'threshold', the algorithm removes any datapoint below the threshold and normalizes only with those above the threshold?

Thanks!

neg_candidate_thres: The upper bound for the negative peak. Users can refer to their IgG samples to obtain the minimal upper bound of the IgG sample peak. It can be one of the values of asinh(4/5+1), asinh(6/5+1), or asinh(8/5+1) if the right 95% quantile of IgG samples is large.

lower_peak_thres: The minimal ADT marker density height of calling it a real peak. Set it to 0.01 to avoid a suspicious positive peak. Set it to 0.001 or smaller to include some small but tend to be real positive peaks, especially for markers like CD19.

yezhengSTAT commented 1 year ago

Hello, You can refer to https://yezhengstat.github.io/ADTnorm/articles/ADTnorm-tutorial.html#definition-of-the-peak for more reading about the shoulder peak (heavy right tail of the density distribution).

neg_candidate_thres and lower_peak_thres are used to better identify the peak and normalize the ADT count. They are not used for separating the background population from the real signal population. I am confused about "subtract the exact IgG reads." My understanding of the IgG antibody's unspecific binding is that such unspecific binding also has variations. Namely, the IgG count and the unspecific binding (the background signal) of other antibodies (CD3 CD4 etc.) are unlikely to be the exact same value. That is why we have a negative peak, and the values in the negative peak describe the variations of unspecific binding. Therefore, I proposed to you that you do ADTnorm first, which will give you the location of the negative peak and the valley that separates the negative peak from the positive peaks. You may also use the upper bound of IgG density distribution as the threshold to separate the negative peak from the positive peaks.

Thanks, Ye

yi6kim commented 1 year ago

Oh! I did not realize this page existed. Thank you for providing this link! I had seen that there were two external sites linked in README, the first leading to the documentation for ADTnorm: https://yezhengstat.github.io/ADTnorm/reference/ADTnorm.html, and second, which said it's a 'more detailed tutorial', but leading to the page displaying the same contents as readme: https://yezhengstat.github.io/ADTnorm/index.html. (Maybe I should have navigated this website a little more!) But this new page provides the materials I was looking for. :)