pachterlab / sleuth

Differential analysis of RNA-Seq
http://pachterlab.github.io/sleuth
GNU General Public License v3.0
305 stars 95 forks source link

'aggregation column' not working #164

Open sklasfeld opened 6 years ago

sklasfeld commented 6 years ago

when I run the following: so <- sleuth_prep(s2c, ~condition, target_mapping = t2g, aggregation_column = 'ens_gene', max_bootstrap = max_boot, num_cores = 4) I do not get any errors. I believe this should set the sleuth object to 'gene_mode'. However, when I run plot_pca(so, color_by = 'condition', units = "scaled_reads_per_base") I get the following error: your sleuth object is not in gene mode, but you selected 'scaled_reads_per_base'. Selecting 'est_counts'.

The head of my t2g dataframe looks like: target_id ens_gene 1 AT3G11415.1 AT3G11415 2 AT1G31258.1 AT1G31258 3 AT5G24735.1 AT5G24735 4 AT2G45780.1 AT2G45780

I recently updated sleuth and this is a new error. There is no error if I run "est_counts" as the unit, but I would like to run PCA in a gene-centric way.

warrenmcg commented 6 years ago

If you updated to the latest devel branch, the ability to do gene-level aggregation of counts is currently turned off for reasons of other things that are being implemented, and that's the cause for the error. @pimentel and the team have to discuss how to best handle this. Thanks for pointing this out and letting us know that you continue to want to be able to do gene-centric analyses!

sklasfeld commented 6 years ago

You're welcome. Also when I continue down the pipeline, the b value for the wald-test is not output anymore. Does this have to do with having the aggregation_column?

warrenmcg commented 6 years ago

Yes, that's correct. In the new method, the aggregation happens on the p-values for the transcript-level results, so it doesn't make sense to display a gene-level b when there was no aggregation of the counts. The p-value aggregation approach is found in the group's recent preprint (here).

dpcook commented 6 years ago

Hey all. I'm wondering if there are any plans on restoring gene-level aggregation of counts? As it worked before, it was very convenient for running the simple "which genes are differentially expressed and by how much?" analysis. If it makes more sense to hold off aggregation until you have p-values, perhaps it would be valuable to have convenience functions/options so that plotting, data exports, etc can be collapsed to gene-level counts.

warrenmcg commented 6 years ago

Hi @dpcook, the old mode of gene-level count aggregation is still fully operational. As discussed in the new documentation, you do this by running sleuth_prep with gene_mode = TRUE. Note that this is mutually exclusive with the new p-value aggregation mode (the default). Within the sleuth object, you will see a new object called pval_aggregate; this will tell you if you're in p-value aggregation mode or not.

dpcook commented 6 years ago

Ahh okay. My apologies! I missed the section on gene_mode at the bottom of the documentation. Happy to see it's still functional! :P