statOmics / tradeSeq

TRAjectory-based Differential Expression analysis for SEQuencing data
Other
237 stars 27 forks source link

Question about counts, and a problem with evaluateK #62

Closed rpolicastro closed 4 years ago

rpolicastro commented 4 years ago

Hi all,

Thanks for your work on this software. I was hoping you could clear up a question with counts, and help me troubleshoot a problem with evaluateK.

When following the Seurat3 SCTransform plus data integration workflow, you are left with a few count types. The Seurat team does not recommend using the counts from the integration assay for most downstream applications. That leaves you with the RNA assay that has the raw counts, and the the SCT assay has the corrected counts, log transformed corrected counts, and the Pearson residuals. Which of the counts would you all recommend to be used?

The second question is in regards to the evaluateK function. For testing out the tradeSeq software, I downsampled to ~2k cells and ran my data through slingshot without a hitch (I ended up with 10 lineages). I then ran the evaluateK function but kept getting the following error (along with really slow run times on the order of days).

Warning message:
In .findKnots(nknots, pseudotime, wSamp) :
  Impossible to place a knot at all endpoints.Increase the number of knots to avoid this issue.

This was the command I ran using tradeSeq v1.3.13 from the 'conditions' branch, and I used the normalized counts from Seurat3. I only went up to 30 knots because everything below 10 was giving me that warning, but that warning was still present even up to 30.

knots <- evaluateK(
  counts = as.matrix(seurat_obj[[1]][["SCT"]]@counts),
  sds = trajectory[[1]], nGenes = 100,
  conditions = factor(seurat_obj[[1]]$orig.ident),
  plot = TRUE, k = seq(5, 30, 5)
)

I was wondering if you had any insights on how to avoid this warning, and ways to possibly increase the speed of the function.

Cheers!

HectorRDB commented 4 years ago

Hi @rpolicastro Thanks for the question, it's a very interesting one. The recommandations from the Seurat team on what assay to use downstream of integration depend on your question of interest. For differential expression though, they recommend using the "RNA" assay, which, as far as I understand it, is the count data, scaled by library size (see https://satijalab.org/seurat/faq, Q4). tradeSeq does its own adjustment for library size when fitting the smoothers. As such, providing tradeSeq with the count matrix would fit exactly their reasoning. Note that even though you do not use the results of the integration here, tradeSeq relies on the slingshot output which in turn is based on the integrated data so your integration plays a part while estimating pseudotimes and lineage assignments.

For the second question, let me explain how tradeSeq tries to position the knots. It picks nknots at the quantile of the overall pseudotime distribution. Then, we try to modify the knots so that we place a knot at the end of each lineage. Here, what is most likely happening is that two lineages have nearly (but not exactly) the same length so you would need to increase nknots to a ridiculous value to have the endpoints fall in different quantiles. I would therefore not worry about the warning.

As for actually running evaluateK, the function scales in the number of parameter to evaluate so if you have 10 lineages, it will scale in 10 * nknots, which explains why it is so slow. In practice, we found that values of nknots between 4 and 12 fit all situations. That should already deliver quite a boost in computing time without needing to downsample.

Let me know if anything is unclear.

rpolicastro commented 4 years ago

Thanks for the quick response and thorough explanation!

I don't believe the "RNA" assay stores normalized or scaled counts if you use SCT normalization. I believe one of the slots holds normalized and scaled data if you use the older normalization method. Regardless, it seems as though using the raw counts is perfectly fine from what you are saying, since tradeSeq does its own thing.