niaid / dsb

Normalize CITEseq Data
Other
63 stars 13 forks source link

(denoise.counts=TRUE, use.isotype.control=TRUE) very similar (denoise.counts=FALSE, use.isotype.control=FALSE) #37

Closed domi84 closed 1 year ago

domi84 commented 1 year ago

Hello, thanks for this interesting package, I am just using the code https://cran.r-project.org/web/packages/dsb/vignettes/end_to_end_workflow.html on a dataset I have to see what difference it can make. But I realized that having the option (denoise.counts = TRUE, use.isotype.control = TRUE) or (denoise.counts = FALSE, use.isotype.control = FALSE) return a very similar result...which surprise me .

This

cells.dsb.norm1 = DSBNormalizeProtein(
  cell_protein_matrix = cell.adt.raw,
  empty_drop_matrix = background.adt.mtx,
  denoise.counts = TRUE,
  use.isotype.control = TRUE,
  isotype.control.name.vec = isotype.controls
  )

takes longer time to run, and return: [1] "correcting ambient protein background noise" [1] "some proteins with low background variance detected check raw and normalized distributions. protein stats can be returned with return.stats = TRUE" [1] "Hu.CD48" "Hu.CD11c" "Hu.CD31" "Hu.CD62L" "Hu.CD36" [1] "fitting models to each cell for dsb technical component and removing cell to cell technical noise"

  | Hu.CD86 | Hu.CD274 | Hu.CD270 | Hu.CD155 | Hu.CD112 | Hu.CD47 | Hu.CD48 |   -- | -- | -- | -- | -- | -- | -- | -- | -- 0 | 1.944590 | 1.092919 | 1.101433 | 0.8200972 | 0.9423016 | 7.043330 | 1.31645800 |   1 | 4.206515 | 1.143181 | 1.104835 | 0.9502019 | 0.9075726 | 10.947148 | -0.32764956 |   2 | 2.777990 | 1.221543 | 1.247028 | 0.8633896 | 1.0350072 | 7.846072 | -0.06536306

while this:

cells.dsb.norm2 = DSBNormalizeProtein(
  cell_protein_matrix = cell.adt.raw,
  empty_drop_matrix = background.adt.mtx,
  denoise.counts = FALSE,
  use.isotype.control = FALSE
  )

is very quick, and return: [1] "Not running dsb step II (removal of cell to cell technical noise) Setting use.isotype.control and isotype.control.name.vec to FALSE and NULL" [1] "potential isotype controls detected: " [1] "Isotype_MOPC.21" "Isotype_MOPC.173" "Isotype_MPC.11" "Isotype_RTK4530" [5] "Isotype_RTK2071" "Isotype_G0114F7" "Isotype_RTK2758" "Isotype_RTK4174" [9] "Isotype_HTK888"
[1] "correcting ambient protein background noise" [1] "some proteins with low background variance detected check raw and normalized distributions. protein stats can be returned with return.stats = TRUE" [1] "Hu.CD48" "Hu.CD11c" "Hu.CD31" "Hu.CD62L" "Hu.CD36"

  | Hu.CD86 | Hu.CD274 | Hu.CD270 | Hu.CD155 | Hu.CD112 | Hu.CD47 | Hu.CD48 |   -- | -- | -- | -- | -- | -- | -- | -- | -- 0 | 1.582482 | 1.052647 | 1.120445 | 0.9469014 | 0.9177784 | 7.094805 | 1.4910953 |   1 | 4.205012 | 1.052647 | 1.120445 | 0.9469014 | 0.9177784 | 10.995484 | -0.4915249 |   2 | 2.981426 | 1.306219 | 1.324305 | 0.9469014 | 0.9177784 | 8.029984 | -0.4915249 |  

which is different of course, but surprising similar, even the heatmaps looks identical image Also FeaturePlot/VlnPlot looks very similar

Am I doing anything wrong? Thanks

MattPM commented 1 year ago

Hi @domi84 Apologies for my delayed response. The correction in step II is often a smaller adjustment. A heatmap will not really show difference between dsb with or without step II, since this is adjusting cell to cell variations and the mean values will probably be similar. You could look at the differences when doing some protein-protein correlations to get a feel for the the changes in cell to cell variation that this step has. In particular, values of a single protein within a single cell type at the cell to cell level, for example. In this particular dataset it does look like step II has a relatively small effect, based on a few cells you showed, but the values are different and those differences and the larger differences you might see in other cells are due to removal of technical cell to cell variations.