mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Problem with R processing #55

Closed alp78 closed 4 years ago

alp78 commented 4 years ago

Hello,

I get an error at the differential analysis step:

Are you going to maintain the code so it's compatible with latest versions of R (from 3.6) and Python (from 3.6) ?

Error in checkForExperimentalReplicates(object, modelMatrix) :

The design matrix has the same number of samples and coefficients to fit, so estimation of dispersion is not possible. Treating samples as replicates was deprecated in v1.20 and no longer supported since v1.22.

Calls: DESeq ... estimateDispersions -> .local -> checkForExperimentalReplicates Execution halted

Here is the command:

TEtranscripts \
--format BAM \
--stranded reverse \
-t [TsampleSortedByName.bam] \
-c [CsampleSortedByName.bam] \
--GTF Homo_sapiens.GRCh38.98.gtf \
--TE line_alu.gtf \
--mode multi \
--project DIFFTE/

STAR command was:

STAR \
--runThreadN 70 \
--genomeDir STAR_INDEX \
--readFilesIn [paired-end_R1] [paired_end_R2] \
--outSAMtype BAM Unsorted \
--winAnchorMultimapNmax 200 \
--outFilterMultimapNmax 100 \
--outFileNamePrefix STAR/ \
--readFilesCommand zcat 

And then sorted by name with samtools=1.9 (a later version than for TEtranscripts processing environment to use the -@ cores).

Versions:

alp78 commented 4 years ago

I could make it work with R=3.4.4 and DSeq2=1.18.1, however I still get the following warning message:

In checkForExperimentalReplicates(object, modelMatrix) : same number of samples and coefficients to fit, estimating dispersion by treating samples as replicates. please read the ?DESeq section on 'Experiments without replicates'. in summary: this analysis only potentially useful for data exploration, accurate differential expression analysis requires replication

and the sigdiff_gene_TE.txt is empty.

Any suggestion?

olivertam commented 4 years ago

Hi,

Unfortunately, the DESeq2 warning/error is something that is related to the experimental design, and not the TEtranscripts software. DESeq2 versions since 1.22.0 requires at least 2 replicates when doing the comparison, and would not run otherwise (as outlined in the error messages about replicates). Unfortunately, other than forcing users to have at least two replicates in their differential analysis, there is no good fix for it. However, we feel that the quantification results (the .cntTable file) would be helpful if you wish to incorporate it to other differential analysis pipelines that still handles no-replicate experiments, and thus we do not want to make that restriction.

Regarding Python 3, we are currently exploring the best approach to provide support there, while not disrupting users who are still on Python 2. TEtranscripts should be compatible with the latest R.

Thanks for your interest.