niaid / dsb

Normalize CITEseq Data
Other
63 stars 13 forks source link

AB isotypes needed? #7

Closed bio-la closed 4 years ago

bio-la commented 4 years ago

Hi, cool method! Can you clarify what you use as criteria to select AB-isotypes? In your paper these are the mouse AB within the human cells experiment, so any unrelated species AB, which is not expected to bind to human cells, would do? I'm planning a citeseq experiment and i want to use your normalization strategy, how is it effective without isotypes to do batch correction? ( i know it's 2 separate problems, I'm just jumping to the conclusion here!) thanks!

MattPM commented 4 years ago

Hi @bio-la we used all the isotype controls that covered the isotype used in our main staining panel which happened to be 4 isotype controls. Say you are staining cells with a single antibody that happens to be an IgG1 isotype. You would only need 1 isotype control, igG1 which has the exact isotype as your antibody for but the "Fab" (fragment antigen binding) is different – this is the part of the antibody with specificity for the target protein of interest. You could use any species; a mouse or rat or goat IgG1 antibody means the antibody is a monoclonal IgG1 that is specific for target from those other species but has the same isotype as the antibody you are interested in quantifying.

The reason in flow cytometry (and now CITE-seq) that people use isotype controls is some cell types have "FC receptors" that can non specifically bind to the "FC" parts of the antibody which is basically the part that determines the isotype, instead of the "Fab" the part that is specific for a target of interest. That is also the reason one uses FC blocking reagent in antibody staining experiments, it blocks the FC receptors on cells preventing antibodies from non specifically staining, but it is not perfect so that is the reason for isotype controls. In flow isotype controls are designed to give one a non quantitative assessment of the level of background staining in the experiment. Here we use them for data denoising.

The isotypes actually do more than just batch correction in the DSB method, they are also used to control for technical, non-biological sources of variation in protein library size differences in the cells within a batch. It is hard to know how well it will work with or without because every experiment is different with respect to potential background staining and for that reason it is good practice to include them.

M

bio-la commented 4 years ago

Hey @MattPM thanks for the swift reply! thanks for the info, it makes perfect sense. It's worth trying to include the extra-abs for my next experiment. However, at the moment I'm analysing a dataset which didn't have controls and DSB normalizing channel-wise without them. it's 3 channels/lanes of 10x each with hto-cite, so I'm normalizing each separately. https://github.com/kotliary/baseline/issues/1 I'm trying to get my head around it, but how would I then correct for batch when aggregating the 3 channels in one object, using limma, and at which stage? thanks again!

sorjuela commented 4 years ago

as far as I understand, the limma function to remove batch effects is implemented within the dsb method already. If the dsb normalization "worked", you shouldn't see a batch effect anymore (I think, I am also testing this)

MattPM commented 4 years ago

@bio-la yes as @sorjuela mentioned you do not need to use limma, it is already part of the dsb function and it is used to "denoise" each cell. It regresses out variation that is correlated with the isotypes and the mean of the non staining proteins in that cell.

From what we have seen, the batch effect in CITEseq is primarily from separate staining reactions (which is the same as flow), or from differences in sequencing depth between groups of 10X lanes sequenced on the different chips. Each experiment is different, but I would not consider separate 10X lanes to be separate batches, unless all three lanes were sequenced on a separate chip and / or are separate staining reactions.

If you hash barcoded samples, stained them all together in the same tube, then split that tube across 3 10x lanes, the background "soup" is coming from the same staining reaction–I would consider this a single batch and merge all of that data together running DSB on all 3 lanes together. You can check the log protein counts from each lane, if this was the experimental set up, my prediction is there would be little or no difference in the lane to lane variation.

If each 10X lane was stained in separate staining reactions and especially with a separate mixture of antibodies and or sequenced on a separate chip, you might consider those to be separate batches, but if they were sequenced on the same chip I would try merging them together first and running a single dsb call.

For the data in the nature med paper, there were 12 total 10X lanes, 6 were run on one day (the same multiplexed staining reaction split across 6 lanes) another 6 were run the next day, and both of these were sequenced on a separate illumina hiseq chip – we considered these groups of 6 to be separate batches since the sequencing and staining reaction were separate. We normalized the batches separately with dsb which worked well (preprint fig S1).

bio-la commented 4 years ago

hi @MattPM , thanks for the detailed explanation, I agree with your interpretation of batch reaction-wise, unfortunately I do have real batches as in cells stained in different reactions and run on different days, channels and sequencing, with different concentration and cocktails of ABs. I'm testing the DSB method on this data, so I'd be happy to share some plots/data with you if I make interesting progress.

igordot commented 4 years ago

We normalized the batches separately with dsb which worked well

If the batches are normalized with dsb separately, do you then need to use limma to remove batch effect or can you just combine values from the separate dsb matrices without further adjustment?

MattPM commented 4 years ago

Hi @igordot, I'd take a look at the values and if the batches overlap well, you're good. In the data that is published in the nature med paper and preprint, the batches overlapped perfectly when we normalized them separately and combined the matrices without further adjustment. In some unpublished data, we have combined all batches together (background and cell containing drops) and normalized all together at the same time–this is less ideal but if the background is really different in the 2 batches (say if less wash steps were used in one batch) then it might be necessary. If the resulting batches are really different it might be necessary to use some type of batch integration method a la seurat, scpopcorn, harmony mnncorrect etc. We haven't needed to do this yet.

We wrote about the removebatcheffect function in the preprint which might be confusing -- it's just the name of a function we're using from limma as a convenience. We didn't use that function for batch correction, it is used per cell to just do an anova fit during the denoise step that gets run on each cell independently if denoistcounts is set to TRUE.