niaid / dsb

Normalize CITEseq Data
Other
63 stars 13 forks source link

Selecting Cut-Off Value For ADT Expression #43

Closed codeneeded closed 1 year ago

codeneeded commented 1 year ago

In the vignette and publication, you recommend to set a threshold across all proteins for positivity. In your publication, you have selected 3.5.

I'm thinking you take your isotype controls and put positivity above that?

image

To justify this how would I generate something like this where I can show my threshold and expressed/unexpressed?

  1. How would this threshold be applied to the differential expression? Do I need to subtract some value during differential expression or would we have to set the values of all proteins deemed unexpressed (under the threshold) to 0 and then perform differential expression? If so, how would we do that?

Thanks so much for your help!

MattPM commented 1 year ago

Hi @codeneeded I'd check out the first section of the paper for more info on that threshold. After you normalize with dsb, a value above 3.5 reflects 3.5 standard deviations above ambient noise, ±adjustment for the cell intrinsic technical component.

Any threshold you use has that same interpretation - the number of sd above ambient background noise with the correction for isotype controls already baked in. 3.5 s.d above ambient background noise applied across all proteins worked well on our datasets in the paper and some other projects but you could use another value.

More explained in section 1: https://www.nature.com/articles/s41467-022-29356-8

codeneeded commented 1 year ago

Thanks for your response. To give some background;

1 I have a Stim and Unstim condition. The isotype background is much higher for the stim when compared to the unstim. I have attached below (S is stim, M is unstim);

Isotype_Threshold

This is after DSB normalization. What I wanted to do was for each sample, I would subtract each protein that corresponds to its isotype with the 99% threshold of its isotype control. Then I would set all negative values to 0 or unexpressed. This would be done on a per-sample basis- does this make sense to control for this kind of sample-specific disparity in isotype control protein expression?