mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
229 stars 30 forks source link

TEcount: Issue - There are not enough reads to estimate fragment length. #189

Closed MichMourra closed 5 months ago

MichMourra commented 5 months ago

Hi olivertam, I just wanted to ask what could be the reason for this error. It only happens in certain files from the dataset, I have checked the size of these files and they are similar to others that have been processed successfully. I am also looking at the contents of the BAM file in IGV and the reads are there.

Thanks for your help.

INFO  @ Tue, 04 Jun 2024 10:13:08: 
# ARGUMENTS LIST:
# name = TEcount_multi-els/24L006858.
# BAM file = ./BAM_els/24L006858.Aligned.sortedByCoord.out.bam
# GTF file = genome/combined.gtf 
# TE file = annotation/Modified_mm10_rmsk_subfamilies.gtf 
# multi-mapper mode = multi 
# stranded = no 
# number of iteration = 100
# Alignments grouped by read ID = False

INFO  @ Tue, 04 Jun 2024 10:13:08: Processing GTF files ... 

INFO  @ Tue, 04 Jun 2024 10:13:08: Building gene index ....... 

100000 GTF lines processed.
200000 GTF lines processed.
300000 GTF lines processed.
400000 GTF lines processed.
500000 GTF lines processed.
600000 GTF lines processed.
700000 GTF lines processed.
800000 GTF lines processed.
INFO  @ Tue, 04 Jun 2024 10:18:27: Done building gene index ...... 

INFO  @ Tue, 04 Jun 2024 10:18:59: Building TE index ....... 

INFO  @ Tue, 04 Jun 2024 10:22:49: Done building TE index ...... 

INFO  @ Tue, 04 Jun 2024 10:22:49: 
Reading sample file ... 

[E::idx_find_and_load] Could not retrieve index file for '.1717489369.5749154.bam'
uniq te counts = 2954 
.......start iterative optimization .......... 
There are not enough reads to estimate fragment length. 
Error in optimization
Error occurred when assigning read gene/TE annotations in file ./BAM_els/24L006858.Aligned.sortedByCoord.out.bam. 
Error: No active exception to reraise 
[Exception type: RuntimeError, raised in TEcount:409] 

The command I run is the following:

TEcount --sortByPos -b ./BAM_els/24L006858.Aligned.sortedByCoord.out.bam --mode multi --TE annotation/Modified_mm10_rmsk_subfamilies.gtf --GTF genome/combined.gtf --stranded no --project TEcount_multi-els/24L006858.

olivertam commented 5 months ago

Hi,

Thank you for your interest in the software. It appears to have trouble estimating fragment size. Are you using a paired-end library? Could you provide the BAM headers (particularly the alignment command that you used)?

Thanks.

MichMourra commented 5 months ago

Hi Olivertam,

Thanks a lot for your quick answer. Sure I send you the header of the bam file:

@PG ID:STAR PN:STAR VN:2.7.11a  CL:/ictstr01/home/ies/carlos.mourra/micromamba/envs/RNA-Seq-TE/bin/STAR-avx2   --runMode alignReads      --runThreadN 12   --genomeDir genome   --readFilesIn ./tailed/24L006858_els_clean_1P.fastq.gz   ./tailed/24L006858_els_clean_1P.fastq.gz      --readFilesCommand gunzip   -c      --limitBAMsortRAM 40000000000   --outFileNamePrefix BAM_els/24L006858.   --outSAMtype BAM   SortedByCoordinate      --outFilterMultimapNmax 100   --winAnchorMultimapNmax 100   --sjdbGTFfile genome/combined.gtf
@CO user command line: /ictstr01/home/ies/carlos.mourra/micromamba/envs/RNA-Seq-TE/bin/STAR-avx2 --readFilesIn ./tailed/24L006858_els_clean_1P.fastq.gz ./tailed/24L006858_els_clean_1P.fastq.gz --readFilesCommand gunzip -c --genomeDir genome --sjdbGTFfile genome/combined.gtf --runThreadN 12 --outSAMtype BAM SortedByCoordinate --runMode alignReads --outFilterMultimapNmax 100 --winAnchorMultimapNmax 100 --limitBAMsortRAM 40000000000 --outFileNamePrefix BAM_els/24L006858.
olivertam commented 5 months ago

Hi.

Thanks for providing the BAM headers. I noticed that you had the same input file for your STAR alignment (./tailed/24L006858_els_clean_1P.fastq.gz). Is that intentional? I would have suspected that the two paired ends would be different files.

Thanks.

MichMourra commented 5 months ago

That's absolutely right, the files which are not working were wrongly generated. Sorry, it was my mistake. I think is going to work after I redo the alignment with the rigth paired file.

Thank you so much for your help.