Closed ZeyanZhang closed 3 years ago
Hi Zeyan,
Looking at the log output, it appears that there were no TE reads in your library. One thing that I would check/confirm is that the chromosome names across all three files (BAM, Gene GTF and TE GTF) match up. I noticed that you're using Gencode hg38, but I just wanted to make sure. Also, I noticed that there were not enough reads to estimate fragment length. That could indicate that there were no alignments in the BAM file. It might be worth checking that too. If that doesn't explain the error, please feel free to share an excerpt of your BAM and GTF files, and I can take a closer look.
Thanks.
@olivertam Hi Oliver, Thank you so much! The chromosome names are matched. Below are excerpts of the files. I can not figure out what's wrong, can you please have a look? Thanks again!
samtools view -h BAM-STAR/G20_CTL_rep1.bam |head
@HD VN:1.4 SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
samtools view -h BAM-STAR/G20_CTL_rep1.bam | head -300 | tail
K00291:228:H3735BBXY:1:1201:29559:40332 355 chr1 14451 3 50M1S = 14738 338 CTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGAAGT AAAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ NH:i:2 HI:i:2 AS:i:97 nM:i:1
K00291:228:H3735BBXY:1:1201:29559:40332 355 chr1 14451 3 50M1S = 14738 338 CTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGAAGT AAAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ NH:i:2 HI:i:2 AS:i:97 nM:i:1
K00291:228:H3735BBXY:2:1128:14702:48386 355 chr1 14452 1 49M2S = 14551 144 TCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGAAGTA AAFFFJFFFFJJJJJFFJJJJ<JJFJJJFFJFFJJJ<JJFFJJFFJ<AJFJ NH:i:4 HI:i:3 AS:i:90 nM:i:1
K00291:228:H3735BBXY:2:1128:14702:48386 355 chr1 14452 1 49M2S = 14551 144 TCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGAAGTA AAFFFJFFFFJJJJJFFJJJJ<JJFJJJFFJFFJJJ<JJFFJJFFJ<AJFJ NH:i:4 HI:i:3 AS:i:90 nM:i:1
K00291:228:H3735BBXY:2:2120:2595:20340 355 chr1 14458 1 51M = 14615 208 GAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCA AAFFFJFJJJJJJJJFJJJJJJFJJFJJJJJJJJJJJJJJJJJJJFJFAJJ NH:i:4 HI:i:2 AS:i:94 nM:i:3
K00291:228:H3735BBXY:2:2120:2595:20340 355 chr1 14458 1 51M = 14615 208 GAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCA AAFFFJFJJJJJJJJFJJJJJJFJJFJJJJJJJJJJJJJJJJJJJFJFAJJ NH:i:4 HI:i:2 AS:i:94 nM:i:3
K00291:228:H3735BBXY:2:1116:19360:7679 355 chr1 14460 0 51M = 14553 144 ACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGA AAFFFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF NH:i:6 HI:i:2 AS:i:96 nM:i:2
K00291:228:H3735BBXY:2:1116:19360:7679 355 chr1 14460 0 51M = 14553 144 ACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGA AAFFFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF NH:i:6 HI:i:2 AS:i:96 nM:i:2
K00291:228:H3735BBXY:2:1124:30371:28024 403 chr1 14461 0 51M = 14400 -112 CACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGAC 7FAJFFAJF<FFAJJJJJJJ7<FFJJJJFJJFAJFJJJFA-F<AA7-<7-< NH:i:6 HI:i:4 AS:i:88 nM:i:1
K00291:228:H3735BBXY:2:1124:30371:28024 403 chr1 14461 0 51M = 14400 -112 CACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGAC 7FAJFFAJF<FFAJJJJJJJ7<FFJJJJFJJFAJFJJJFA-F<AA7-<7-< NH:i:6 HI:i:4 AS:i:88 nM:i:1
samtools view -H BAM-STAR/G20_CTL_rep1.bam | tail
@SQ SN:chrUn_KI270755v1 LN:36723
@SQ SN:chrUn_KI270756v1 LN:79590
@SQ SN:chrUn_KI270757v1 LN:71251
@SQ SN:chrUn_GL000214v1 LN:137718
@SQ SN:chrUn_KI270742v1 LN:186739
@SQ SN:chrUn_GL000216v2 LN:176608
@SQ SN:chrUn_GL000218v1 LN:161147
@SQ SN:chrEBV LN:171823
@PG ID:STAR PN:STAR VN:2.7.3a CL:STAR --runThreadN 8 --genomeDir ref/hg38/STAR.gencode.v34 --genomeLoad NoSharedMemory --readFilesIn FASTQ-TRIMMED/G20_CTL_rep1_R1.trim.fastq.gz FASTQ-TRIMMED/G20_CTL_rep1_R2.trim.fastq.gz --readFilesCommand zcat --outFileNamePrefix logs-align-star/G20_CTL_rep1_ --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outFilterMultimapNmax 100 --winAnchorMultimapNmax 100
@CO user command line: STAR --runThreadN 8 --genomeDir ref/hg38/STAR.gencode.v34 --genomeLoad NoSharedMemory --winAnchorMultimapNmax 100 --outFilterMultimapNmax 100 --readFilesCommand zcat --readFilesIn FASTQ-TRIMMED/G20_CTL_rep1_R1.trim.fastq.gz FASTQ-TRIMMED/G20_CTL_rep1_R2.trim.fastq.gz --outFileNamePrefix logs-align-star/G20_CTL_rep1_ --outSAMtype BAM Unsorted --outStd BAM_Unsorted
head hg38.refGene.gtf
chr1 refGene transcript 11874 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018"; gene_name "DDX11L1";
chr1 refGene exon 11874 12227 . + . gene_id "DDX11L1"; transcript_id "NR_046018"; exon_number "1"; exon_id "NR_046018.1"; gene_name "DDX11L1";
chr1 refGene exon 12613 12721 . + . gene_id "DDX11L1"; transcript_id "NR_046018"; exon_number "2"; exon_id "NR_046018.2"; gene_name "DDX11L1";
chr1 refGene exon 13221 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018"; exon_number "3"; exon_id "NR_046018.3"; gene_name "DDX11L1";
chr1 refGene transcript 14362 29370 . - . gene_id "WASH7P"; transcript_id "NR_024540"; gene_name "WASH7P";
chr1 refGene exon 14362 14829 . - . gene_id "WASH7P"; transcript_id "NR_024540"; exon_number "1"; exon_id "NR_024540.1"; gene_name "WASH7P";
chr1 refGene exon 14970 15038 . - . gene_id "WASH7P"; transcript_id "NR_024540"; exon_number "2"; exon_id "NR_024540.2"; gene_name "WASH7P";
chr1 refGene exon 15796 15947 . - . gene_id "WASH7P"; transcript_id "NR_024540"; exon_number "3"; exon_id "NR_024540.3"; gene_name "WASH7P";
chr1 refGene exon 16607 16765 . - . gene_id "WASH7P"; transcript_id "NR_024540"; exon_number "4"; exon_id "NR_024540.4"; gene_name "WASH7P";
chr1 refGene exon 16858 17055 . - . gene_id "WASH7P"; transcript_id "NR_024540"; exon_number "5"; exon_id "NR_024540.5"; gene_name "WASH7P";
head hg38_rmsk_TE.gtf
chr1 hg38_rmsk exon 100000001 100000637 11325 - . gene_id "L1M2"; transcript_id "L1M2_dup229"; family_id "L1"; class_id "LINE";
chr1 hg38_rmsk exon 10000002 10000239 1760 + . gene_id "AluSx3"; transcript_id "AluSx3_dup157"; family_id "Alu"; class_id "SINE";
chr1 hg38_rmsk exon 100000744 100002612 11325 - . gene_id "L1M2"; transcript_id "L1M2_dup230"; family_id "L1"; class_id "LINE";
chr1 hg38_rmsk exon 10000251 10000566 2225 + . gene_id "AluSx"; transcript_id "AluSx_dup700"; family_id "Alu"; class_id "SINE";
chr1 hg38_rmsk exon 100002613 100002913 1799 - . gene_id "AluJr"; transcript_id "AluJr_dup3513"; family_id "Alu"; class_id "SINE";
chr1 hg38_rmsk exon 100002914 100003133 11325 - . gene_id "L1M2"; transcript_id "L1M2_dup231"; family_id "L1"; class_id "LINE";
chr1 hg38_rmsk exon 100003134 100003444 2452 + . gene_id "AluSq2"; transcript_id "AluSq2_dup2641"; family_id "Alu"; class_id "SINE";
chr1 hg38_rmsk exon 100003445 100004305 6882 - . gene_id "L1M2"; transcript_id "L1M2_dup232"; family_id "L1"; class_id "LINE";
chr1 hg38_rmsk exon 100004306 100004578 1643 - . gene_id "AluJb"; transcript_id "AluJb_dup6336"; family_id "Alu"; class_id "SINE";
chr1 hg38_rmsk exon 100004579 100004715 6794 - . gene_id "L1M2"; transcript_id "L1M2_dup233"; family_id "L1"; class_id "LINE";
Best, Zeyan
Hi Zayan,
Would you mind sharing the BAM file with me (or at least all alignments from a chromosome)? I can't tell from the excerpt what might be wrong.
Thanks.
Thanks, Oliver. Please see the link below for the BAM file. Please let me know if you can not open it.
https://www.dropbox.com/s/it17x5rgvo3lh7m/G20_CTL_rep1.bam?dl=0
Thanks again for your kind help, Zeyan
On Wed, Aug 25, 2021 at 11:44 AM Oliver Tam @.***> wrote:
Hi Zayan,
Would you mind sharing the BAM file with me (or at least all alignments from a chromosome)? I can't tell from the excerpt what might be wrong.
Thanks.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mhammell-laboratory/TEtranscripts/issues/98#issuecomment-905630387, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALJ6EK45U5LDZJCHIFGDQ7TT6UFULANCNFSM5CZEU64Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
-- ωǒ京ㄤ是め娲ネト天剩下DEζ疯狂de石头∞→◎
I am able to reproduce your error. I'll start digging into the BAM file, and let you know what I found.
Thanks.
Hi Zeyan,
There is something very weird with your BAM files. I extracted a bunch of uniquely aligned reads from chromosome 7, and sorted them by sequence name. I normally would expect them to be orientated in the F/R direction (e.g. SAM flag pairs of 82 and 163, or 99 and 147). However, what I see is that the both alignments with the same read name have the same orientation:
K00291:228:H3735BBXY:1:1101:1519:16471 83 chr7 128748792 255 13M5457N37M1S = 128748731 -5568 AGGAAAGGCTTGGAAAGATTGTAAGTAAAATAGATGGCGACAAGGACGGGC JJJJJJJJJJFJJJJJJJFJJJJJJJJAJ<AJJAJJJFA7FJJJJJFFFAA NH:i:1 HI:i:1 AS:i:99 nM:i:0
K00291:228:H3735BBXY:1:1101:1519:16471 83 chr7 128748792 255 13M5457N37M1S = 128748731 -5568 AGGAAAGGCTTGGAAAGATTGTAAGTAAAATAGATGGCGACAAGGACGGGC JJJJJJJJJJFJJJJJJJFJJJJJJJJAJ<AJJAJJJFA7FJJJJJFFFAA NH:i:1 HI:i:1 AS:i:99 nM:i:0
K00291:228:H3735BBXY:1:1101:1519:16471 163 chr7 128748731 255 50M = 128748792 5568 GATGCCTTCTTGGGTGCTGAAGAAGCAAAGACCTTTGATCAGCTGACACC AA<<AJJJFFJJJJJFJAAJJJFJJJJJ<FJJJJJJFJJJJJF7AJJFJF NH:i:1 HI:i:1 AS:i:99 nM:i:0
K00291:228:H3735BBXY:1:1101:1519:16471 163 chr7 128748731 255 50M = 128748792 5568 GATGCCTTCTTGGGTGCTGAAGAAGCAAAGACCTTTGATCAGCTGACACC AA<<AJJJFFJJJJJFJAAJJJFJJJJJ<FJJJJJJFJJJJJF7AJJFJF NH:i:1 HI:i:1 AS:i:99 nM:i:0
K00291:228:H3735BBXY:1:1101:1519:21676 99 chr7 5528495 255 51M = 5529164 714 GCGCTCGGTGAGGATCTTCATGAGGTAGTCAGTCAGGTCCCGGCCAGCCAG AAAFAFJJJFJJJFAJFJJJJJAFAJFJFJJJJJJJJJJFJJJJFJJJJAJ NH:i:1 HI:i:1 AS:i:94 nM:i:0
K00291:228:H3735BBXY:1:1101:1519:21676 99 chr7 5528495 255 51M = 5529164 714 GCGCTCGGTGAGGATCTTCATGAGGTAGTCAGTCAGGTCCCGGCCAGCCAG AAAFAFJJJFJJJFAJFJJJJJAFAJFJFJJJJJJJJJJFJJJJFJJJJAJ NH:i:1 HI:i:1 AS:i:94 nM:i:0
This has me wondering if the two FASTQ inputs are the same. You could check the sequences in the trimmed FASTQ file to ensure that they are not the same (and they shouldn't be, since you would only get this situation if your fragment is an inverted repeat). This observation would explain why TEcount failed, as it was trying to find "properly paired" reads (F/R orientation), and since it could not find any, failed to calculate an insert size and exited with error.
Let me know if this does not make sense.
Thanks.
@olivertam Hi Oliver, Thank you so much!!! The R1 and R2 FASTQ are not the same as shown below. Yes, it's so wired. I will re-do alignment to see what's wrong. I will let you know how it goes.
zcat FASTQ-TRIMMED/G20_CTL_rep1_R1.trim.fastq.gz |head
@K00291:228:H3735BBXY:1:1101:3599:1156 1:N:0:ATTCAGAA+TAAGATTA
TGTAAAAGCAGCACCTTTGTGACGTTTTAACTTTAGTATTCCTCTCCTTCT
+
AAFFFJJJFJJJJJJFJJJJJJJJFFJJJJJJJJJJAJJFJ7AJJJJJJAF
@K00291:228:H3735BBXY:1:1101:9019:1156 1:N:0:ATTCAGAN+TAAGATTA
GGTACCTCCAGCGCCCGAGCCGTCCAGGCGGCCAGCAGGAGCAGTGCCNAA
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ#JJ
@K00291:228:H3735BBXY:1:1101:9080:1156 1:N:0:ATTCAGAN+TAAGATTA
GCTTGGTACTTATGCTGGGAGATATGCAAAATGTGTTTTTCAATGTTTNCT
zcat FASTQ-TRIMMED/G20_CTL_rep1_R2.trim.fastq.gz |head
@K00291:228:H3735BBXY:1:1101:3599:1156 2:N:0:ATTCAGAA+TAAGATTA
GCCAAACGTGCCATTGTTCACTAGTACGATGGGGAGGAGAAGGAGGAAG
+
AA<AFF7JAFJF-77F-7FAF7-<-7A7<-7A<FJFA-7<-<FF<FJF<
@K00291:228:H3735BBXY:1:1101:9019:1156 2:N:0:ATTCAGAN+TAAGATTA
GCGCAGGGTCGCGATGCTGCCCGGTTTNGCACTNCTCCTGCTGGCCGNCT
+
AAFFFJJFJJJJJJJJJFAFJJJJFJJ#JJJJJ#J<FJFJJFFFAFF#JA
@K00291:228:H3735BBXY:1:1101:9080:1156 2:N:0:ATTCAGAN+TAAGATTA
GGCATTGATCCTAACAGCTTGCATTATNTGGTGNATCTGCTCAATCANGT
Thanks again. Best, Zeyan
@olivertam
Hi Oliver, Thank you so much for the kind help! Yes, the problem is the BAM files. The problem was solved after doing re-alignment from my raw FASTQ files. Thanks again! Best, Zeyan
Hi Oliver,
Thanks so much for your kind help to solve the previous problem! I am running TEtranscripts v2.2.1 (Python v3.7.2) for analysis of TE expression from RNA-seq. The parameter for bam files generating and the logs of TEtranscripts were as below. Can you please advise what's wrong? Thank you so much for your time!
align using STAR
The logs INFO of TEtranscripts
Best, Zeyan