t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
38 stars 23 forks source link

Background TC counts #74

Closed BrianLohman closed 4 years ago

BrianLohman commented 4 years ago

Hi Tobias,

I have looked through the issues and a few people have asked about background levels, but for very specific cases. I'd like to ask more generally, what kind of background TC counts do you expect? Is there a suggested way to deal with this background noise? For example, in #52, you suggested subtracting the TC counts from the control sample.

I have run samples from other projects through Slamdunk and I see about 20% of the genes with TC counts > 0. TC counts in these control samples can be as high as 2,700. I would have expected to get very close to 0 for all genes.

Some basic stats for an example sample are below:

table(TC_gene_count == 0)
TRUE FALSE 
17364  3828

summary(TC_gene_count) 
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
1.000   2.000   3.000   6.382   6.000 244.000 

image

Thank you for your help,

Brian

t-neumann commented 4 years ago

Hi Brian,

so what we do mostly lately is labelling following some perturbation (protein degradation or drug treatment). So for those analysis it would be mostly DE-analysis downstream where we feed in the T>C read counts.

Now in Slamdunk, you have the option to only count reads as T>C reads with >= 2 T>C conversions. In the Science paper we showed that this very effectively reduces noise from the assay.

Is that an option for you?

BrianLohman commented 4 years ago

Hi Tobias,

Thank you for your quick response. I will try your suggestion by adding:

-mts 2

to my slamdunk all command. I will let you know how this goes.

The sample I showed above is unlabeled (just a regular RNAseq library from a non-SlamSeq project) so I was concerned about background T>C conversion.

Cheers,

Brian

EDIT: adding -mts 2 returns: slamdunk all: error: argument -t/--threads: invalid int value: 's'

and using the full param name --multiTCStringency returns: slamdunk: error: unrecognized arguments: --multiTCStringency 2

The full command is:

slamdunk all \
  ./17699X5_R1_R2.fq \
  -n 100 \
  -t 12 \
  -m \
  -5 12 \
  -rl 150 \
  -mts 2 \
  -o ./17699X5 \
  -b ../../Mus_musculus.GRCm38.98.3primeUTR.bed \
  -r ../../Mus_musculus.GRCm38.dna.primary_assembly.fa
t-neumann commented 4 years ago

Hi Brian,

sorry this is a fault on my end for being lazy on updating the documentation. The -mts parameter was replaced by -c where you can specify the number of T>C conversions needed in a read to be counted as TCRead. So the equivalent of -mts would be now -c 2.

BrianLohman commented 4 years ago

Hi Tobias,

As always, thanks for the fast reply. I added -c 2 to my call to slamdunk all and the "false positive" (genes in unlabeled samples) rate dropped to about 1%. I think this should do for now.

Thank you!

Cheers,

Brian

t-neumann commented 4 years ago

Sure thing!