pachterlab / sleuth

Differential analysis of RNA-Seq
http://pachterlab.github.io/sleuth
GNU General Public License v3.0
305 stars 95 forks source link

strange error from sleuth_results in gene mode #202

Closed kmattioli closed 6 years ago

kmattioli commented 6 years ago

Hello,

I have been doing differential expression analysis using sleuth today, and I just got this weird error when running DE on my second conditions:

reduced_full <- sleuth_results(so, 'reduced:full', 'lrt', pval_aggregate = TRUE, show_all = FALSE)

Error in sleuth_results(so, "reduced:full", "lrt", pval_aggregate = TRUE,  : 
  This shouldn't happen. Please report this issue.

Here is the rest of the code I am running:

so <- sleuth_prep(s2c_endo_BAFB, ~ condition, read_bootstrap_tpm = TRUE,
                  target_mapping = t2g_gene, aggregation_column = 'gene_id',
                  extra_bootstrap_summary = TRUE, gene_mode = TRUE)
so <- sleuth_fit(so)
so <- sleuth_fit(so, ~1, 'reduced')
so <- sleuth_lrt(so, 'reduced', 'full')

Weirdly, I ran the same code but with a different sample_to_covariates dataframe on very similar data and it worked at first. Then I tried it on the second condition, and it gave me that error above. Then I went BACK to try the first condition that I knew should work, and it gave me that error above.

I am super confused and not an R ninja so any advice would be great. FWIW I have quit R and re-installed sleuth and I still can't get the first condition to work (the one that worked earlier).

Here's what t2g_gene looks like by the way. I made it myself (didn't use biomart) as kallisto output is weird when using Gencode. AFAICT, it has all of the IDs that are outputted in the kallisto abundance.tsv files.

Thanks!

head(t2g_gene)
                                                                                                                target_id
1       ENST00000000233.9|ENSG00000004059.10|OTTHUMG00000023246.6|OTTHUMT00000059567.2|ARF5-001|ARF5|1103|protein_coding|
2        ENST00000000412.7|ENSG00000003056.7|OTTHUMG00000168276.3|OTTHUMT00000399130.1|M6PR-001|M6PR|2756|protein_coding|
3    ENST00000000442.10|ENSG00000173153.13|OTTHUMG00000150641.7|OTTHUMT00000319303.1|ESRRA-002|ESRRA|2215|protein_coding|
4      ENST00000001008.5|ENSG00000004478.7|OTTHUMG00000090429.3|OTTHUMT00000206861.2|FKBP4-001|FKBP4|3732|protein_coding|
5  ENST00000001146.6|ENSG00000003137.8|OTTHUMG00000129756.5|OTTHUMT00000251969.1|CYP26B1-001|CYP26B1|4732|protein_coding|
6 ENST00000002125.8|ENSG00000003509.15|OTTHUMG00000128468.6|OTTHUMT00000250267.1|NDUFAF7-001|NDUFAF7|2176|protein_coding|
             gene_id gene_name        biotype
1 ENSG00000004059.10      ARF5 protein_coding
2  ENSG00000003056.7      M6PR protein_coding
3 ENSG00000173153.13     ESRRA protein_coding
4  ENSG00000004478.7     FKBP4 protein_coding
5  ENSG00000003137.8   CYP26B1 protein_coding
6 ENSG00000003509.15   NDUFAF7 protein_coding
warrenmcg commented 6 years ago

Hi @kmattioli,

Thanks for bringing this to our attention. The reason for the error is that gene_mode is TRUE for your sleuth object, meaning you are doing count aggregation, and then you set pval_aggregate to TRUE when you run sleuth_results. These cannot be both TRUE at the same time, because p-value aggregation and count aggregation are mutually exclusive (how can you further aggregate p-values when you are reporting statistics on gene-level counts?). The failure here was that we forgot to add in a sanity check for when the user supplies their own argument to pval_aggregate.

gene_mode is intended to signal count aggregation, and pval_aggregate is intended to signal p-value aggregation. If you intend to use count aggregation (the old way of doing things), omit the pval_aggregate argument from sleuth_results. If you want to use p-value aggregation, omit the gene_mode argument from sleuth_prep.

Hope that helps!

kmattioli commented 6 years ago

ah - got it! thank you for the explanation!