yezhengSTAT / ADTnorm

ADTnorm normalizes the cell surface protein measurement of CITE-seq data, facilitating across batches and across studies data integration.
https://yezhengstat.github.io/ADTnorm/articles/ADTnorm-tutorial.html
GNU General Public License v3.0
19 stars 4 forks source link

What is ADTseqDepth? #7

Closed yi6kim closed 1 year ago

yi6kim commented 1 year ago

I am a bit unclear what is 'ADTseqDepth' column in the 'cell_x_feature' demo data. From the documentation, ADTseqDepth is referred to as 'total UMI per cell', but does this mean it is simply a sum of all antibody reads for each cell? For instance, if I ran a 5-antibody panel sequencing experiment, and obtained a row (representing one cell) with raw reads as following: CD3 CD4 CD8 CD14 CD19 18 138 13 491 3

Then, is the 'ADTseqDepth' for this cell simply 18+138+13+491+3= 663?

Currently, I don't think I really have a separate 'ADTseqDepth' data, and the only data I receive as an output from the experiment is the raw read matrix for each antibodies, just like the 'cell_x_adt' demo data. (I also get the column of exact unique barcodes used for the cells (e.g. AAGTTGTCTAC for row 1, ATTCTTTCGTTT for row 2, etc.), but I don't think this info would be really relevant.) So, I was wondering how to I substitute the 'ADTseqDepth' in the cell_x_feature parameter.

Furthermore, what could be done if I don't really have a 'sample_status' or 'cell_type_l1' data? There are ways I can surrogate these, but the distinction will not be clear as to which cell is healthy vs tumor. Are all 7 columns in the demo 'cell_x_feature' necessary (and equally important) when ADTnorm is run? Or is it okay if I just provide the 'sample' and 'batch' columns for the 'cell_x_feature'?

Thank you!

yezhengSTAT commented 1 year ago

Hello, Yes, you are correct! ADTseqDepth is the total number of reads for the protein part. Actually, you can run ADTnorm without this column. "sample, batch" are the two columns that are essential. The rests are just for UMAP coloring purposes. I will add this instruction to the next release of the software and vignette.

yi6kim commented 1 year ago

It looks like the code is running successfully with just providing a 2-column dataframe with 'sample' and 'batch' as 'cell_x_feature'. Thank you!

Is ADTnorm also generating UMAP? I am only getting the density plots for each antibody marker (before & after applying normalization, e.g. 'ArcsinhTransform_CD3.pdf' and 'ADTnorm_CD3.pdf').

I've been wondering if I can automatically generate the grouped density plots (showing all of antibodies from CD3 ~ CD19) like in option 1 or option 2 in README, because it is quite hard to compare the plots of individual antibodies (there're a lot of going back and forth).

yezhengSTAT commented 1 year ago

ADTnorm does not automatically generate UMAP. You can generate the UMAP using the normalized count matrix.

You can refer to https://yezhengstat.github.io/ADTnorm/reference/plot_adt_density_with_peak_valley.html this function for generating multiple ADT marker density plot. FYI, we have a relatively more comprehensive tutorial and manual at https://yezhengstat.github.io/ADTnorm/index.html.

yi6kim commented 1 year ago

Actually, is there any way I can easily get the matrices for the parameters 'peak_landmark_list' and 'valley_landmark_list' that goes into the function 'plot_adt_density_with_peak_valley', as a part of the output of ADTnorm? (Especially, are the 'peak_mode_norm_res' and 'valley_location_norm_res' in the example usage below something that can be returned as an output of ADTnorm?)

if (FALSE) { plot_adt_density_with_peak_valley( cell_x_adt, cell_x_feature, adt_marker_select = c("CD3", "CD4", "CD8", "CD19"), peaklandmarklist = peak_mode_norm_res, valley_landmark_list = valley_location_norm_res, brewer_palettes = "Set1", parameter_list = list(bw = 0.1, run_label = "ADTnorm") ) }

I see that the output RDS files automatically created for individual markers contain the 'peak_location' and 'valley_location' in the 'plot_env' variable, but since this is only for 1 antibody, I will need to separately read in each of the 48 RDS file (since I have 48 antibodies) and combine the peak locations into a matrix, and same goes for the valley location. I'm looking for if there's an easier way to do this!

yezhengSTAT commented 1 year ago

Currently you will need to write a for loop to read in individual RDS file.