Closed tonilogbo closed 4 years ago
Hi, That is very unusual. Would you be able to show me the header of your BAM file, and maybe the first 5 lines of your gene GTF file? Thanks.
The BAM header looks like this:
@HD VN:1.4 SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chrM LN:16569
@SQ SN:GL877870.2 LN:66021
@SQ SN:GL877872.1 LN:297485
@SQ SN:GL383535.1 LN:429806
@SQ SN:JH159133.1 LN:266316
@SQ SN:KB663606.1 LN:305900
@SQ SN:KB021647.1 LN:1058686
@SQ SN:KE332497.1 LN:543325
@SQ SN:JH159131.1 LN:393769
@SQ SN:GL949745.1 LN:372609
@SQ SN:JH720447.1 LN:454385
@SQ SN:GL582968.1 LN:356330
@SQ SN:JH720446.1 LN:97345
@SQ SN:JH591181.2 LN:2281126
@SQ SN:JH720443.2 LN:408430
@SQ SN:JH159134.2 LN:3821770
@SQ SN:JH636052.4 LN:7283150
@SQ SN:JH636053.3 LN:1676126
@SQ SN:JH636054.1 LN:758378
@SQ SN:JH636056.1 LN:262912
@SQ SN:JH636058.1 LN:716227
@SQ SN:JH636057.1 LN:200195
@SQ SN:KE332505.1 LN:579598
@SQ SN:JH720451.1 LN:898979
@SQ SN:JH720452.1 LN:522319
@SQ SN:JH720453.1 LN:1461188
@SQ SN:JH720454.3 LN:752267
@SQ SN:JH159136.1 LN:200998
@SQ SN:JH806587.1 LN:4110759
@SQ SN:JH806588.1 LN:862483
@SQ SN:JH806589.1 LN:270630
@SQ SN:JH806590.2 LN:2418393
@SQ SN:JH806591.1 LN:882083
@SQ SN:JH806592.1 LN:835911
@SQ SN:JH806593.1 LN:389631
@SQ SN:JH806594.1 LN:390496
@SQ SN:JH806595.1 LN:444074
@SQ SN:JH806596.1 LN:413927
@SQ SN:JH806597.1 LN:1045622
@SQ SN:JH720448.1 LN:70483
@SQ SN:JH806598.1 LN:899320
@SQ SN:JH806599.1 LN:1214327
@SQ SN:JH806600.2 LN:6530008
@SQ SN:JH806601.1 LN:1389764
@SQ SN:JH806602.1 LN:713266
@SQ SN:JH806573.1 LN:24680
@SQ SN:JH806574.2 LN:22982
@SQ SN:JH806575.1 LN:47409
@SQ SN:JH806580.1 LN:93149
@SQ SN:JH806583.1 LN:167183
@SQ SN:JH806584.1 LN:70876
@SQ SN:JH806585.1 LN:73505
@SQ SN:JH806603.1 LN:182949
@SQ SN:JH159150.3 LN:3110903
@SQ SN:GL582969.1 LN:251823
@SQ SN:JH806577.1 LN:22394
@SQ SN:JH806578.1 LN:169437
@SQ SN:JH806579.1 LN:211307
@SQ SN:JH159137.1 LN:191409
@SQ SN:KE332502.1 LN:341712
@SQ SN:KB021645.1 LN:1523386
@SQ SN:KB663607.2 LN:334922
@SQ SN:KE332500.1 LN:228602
@SQ SN:KE332496.1 LN:503215
@SQ SN:GL383558.1 LN:457041
@SQ SN:GL582976.1 LN:412535
@SQ SN:GL383523.1 LN:171362
@SQ SN:KE332498.1 LN:149443
@SQ SN:GL949743.1 LN:608579
@SQ SN:KE332506.1 LN:307252
@SQ SN:GL383536.1 LN:203777
@SQ SN:JH591184.1 LN:462282
@SQ SN:JH636061.1 LN:186059
@SQ SN:JH806576.1 LN:273386
@SQ SN:GL383524.1 LN:78793
@SQ SN:GL582973.1 LN:321004
@SQ SN:JH159138.1 LN:108875
@SQ SN:KB021648.1 LN:469972
@SQ SN:JH159139.1 LN:120441
@SQ SN:JH159140.1 LN:546435
@SQ SN:JH591182.1 LN:196262
@SQ SN:JH159132.1 LN:100694
@SQ SN:JH720449.1 LN:212298
@SQ SN:JH591183.1 LN:177920
@SQ SN:JH720444.2 LN:273128
@SQ SN:JH159141.2 LN:240775
@SQ SN:KB663604.1 LN:478993
@SQ SN:JH720455.1 LN:65034
@SQ SN:KB021646.2 LN:211416
@SQ SN:JH159142.2 LN:326647
@SQ SN:JH159143.1 LN:191402
@SQ SN:JH806582.2 LN:342635
@SQ SN:JH159135.2 LN:102251
@SQ SN:KE332499.1 LN:274521
@SQ SN:GL877877.2 LN:284527
@SQ SN:JH806586.1 LN:43543
@SQ SN:GL582979.2 LN:179899
@SQ SN:KB663605.1 LN:155926
@SQ SN:GL582975.1 LN:34662
@SQ SN:GL949744.1 LN:276448
@SQ SN:GL383543.1 LN:392792
@SQ SN:GL877871.1 LN:389939
@SQ SN:GL582967.1 LN:248177
@SQ SN:JH159149.1 LN:245473
@SQ SN:GL582977.2 LN:580393
@SQ SN:GL582970.1 LN:354970
@SQ SN:GL383559.2 LN:338640
@SQ SN:JH159144.1 LN:388340
@SQ SN:JH591186.1 LN:376223
@SQ SN:GL383560.1 LN:534288
@SQ SN:GL339450.1 LN:330164
@SQ SN:GL582971.1 LN:1284284
@SQ SN:GL582974.1 LN:163298
@SQ SN:JH806581.1 LN:872115
@SQ SN:JH636060.1 LN:437946
@SQ SN:JH591185.1 LN:167437
@SQ SN:JH159145.1 LN:194862
@SQ SN:GL877873.1 LN:168465
@SQ SN:KB663608.1 LN:283551
@SQ SN:GL582972.1 LN:327774
@SQ SN:KB663603.1 LN:599580
@SQ SN:KE332495.1 LN:263861
@SQ SN:JH636059.1 LN:295379
@SQ SN:JH720445.1 LN:170033
@SQ SN:KE332501.1 LN:1020827
@SQ SN:GL383561.2 LN:644425
@SQ SN:GL949741.1 LN:151551
@SQ SN:GL383562.1 LN:45551
@SQ SN:GL383525.1 LN:65063
@SQ SN:GL383544.1 LN:128378
@SQ SN:GL383548.1 LN:165247
@SQ SN:GL383537.1 LN:62435
@SQ SN:GL383538.1 LN:49281
@SQ SN:GL383516.1 LN:49316
@SQ SN:GL383517.1 LN:49352
@SQ SN:GL383545.1 LN:179254
@SQ SN:GL383546.1 LN:309802
@SQ SN:GL383547.1 LN:154407
@SQ SN:GL877875.1 LN:167313
@SQ SN:GL383549.1 LN:120804
@SQ SN:GL383550.1 LN:169178
@SQ SN:GL383551.1 LN:184319
@SQ SN:GL877876.1 LN:408271
@SQ SN:GL383552.1 LN:138655
@SQ SN:GL383553.2 LN:152874
@SQ SN:GL383554.1 LN:296527
@SQ SN:GL383555.1 LN:388773
@SQ SN:GL383556.1 LN:192462
@SQ SN:GL383557.1 LN:89672
@SQ SN:GL000258.1 LN:1680828
@SQ SN:GL383563.2 LN:270261
@SQ SN:GL383564.1 LN:133151
@SQ SN:GL383565.1 LN:223995
@SQ SN:GL383566.1 LN:90219
@SQ SN:JH159146.1 LN:278131
@SQ SN:JH159147.1 LN:70345
@SQ SN:JH159148.1 LN:88070
@SQ SN:GL383567.1 LN:289831
@SQ SN:GL383568.1 LN:104552
@SQ SN:GL383569.1 LN:167950
@SQ SN:GL383570.1 LN:164789
@SQ SN:GL383571.1 LN:198278
@SQ SN:GL383572.1 LN:159547
@SQ SN:GL949746.1 LN:987716
@SQ SN:GL949747.1 LN:729519
@SQ SN:GL949748.1 LN:1064303
@SQ SN:GL949749.1 LN:1091840
@SQ SN:GL949750.1 LN:1066389
@SQ SN:GL949751.1 LN:1002682
@SQ SN:GL949752.1 LN:987100
@SQ SN:GL949753.1 LN:796478
@SQ SN:GL383573.1 LN:385657
@SQ SN:GL383574.1 LN:155864
@SQ SN:GL383575.2 LN:170222
@SQ SN:GL383576.1 LN:188024
@SQ SN:GL383518.1 LN:182439
@SQ SN:GL383519.1 LN:110268
@SQ SN:GL383520.1 LN:366579
@SQ SN:GL383577.1 LN:128385
@SQ SN:GL383578.1 LN:63917
@SQ SN:GL383579.1 LN:201198
@SQ SN:GL383580.1 LN:74652
@SQ SN:GL383581.1 LN:116690
@SQ SN:GL383582.2 LN:162811
@SQ SN:GL383583.1 LN:96924
@SQ SN:KB663609.1 LN:74013
@SQ SN:GL383521.1 LN:143390
@SQ SN:GL383522.1 LN:123821
@SQ SN:GL582966.2 LN:96131
@SQ SN:JH636055.1 LN:173151
@SQ SN:GL383526.1 LN:180671
@SQ SN:GL000257.1 LN:590426
@SQ SN:GL383527.1 LN:164536
@SQ SN:GL383528.1 LN:376187
@SQ SN:GL383529.1 LN:121345
@SQ SN:GL339449.2 LN:1612928
@SQ SN:GL383530.1 LN:101241
@SQ SN:GL383531.1 LN:173459
@SQ SN:GL383532.1 LN:82728
@SQ SN:GL949742.1 LN:226852
@SQ SN:GL383533.1 LN:124736
@SQ SN:KB021644.1 LN:187824
@SQ SN:GL000250.1 LN:4622290
@SQ SN:GL000251.1 LN:4795371
@SQ SN:GL000252.1 LN:4610396
@SQ SN:GL000253.1 LN:4683263
@SQ SN:GL000254.1 LN:4833398
@SQ SN:GL000255.1 LN:4611984
@SQ SN:GL000256.1 LN:4928567
@SQ SN:GL383534.2 LN:119183
@SQ SN:GL383539.1 LN:162988
@SQ SN:GL383540.1 LN:71551
@SQ SN:GL383541.1 LN:171286
@SQ SN:GL383542.1 LN:60032
@SQ SN:GL000191.1 LN:106433
@SQ SN:GL000192.1 LN:547496
@SQ SN:GL000193.1 LN:189789
@SQ SN:GL000194.1 LN:191469
@SQ SN:GL000195.1 LN:182896
@SQ SN:GL000196.1 LN:38914
@SQ SN:GL000197.1 LN:37175
@SQ SN:GL000198.1 LN:90085
@SQ SN:GL000199.1 LN:169874
@SQ SN:GL000200.1 LN:187035
@SQ SN:GL000201.1 LN:36148
@SQ SN:GL000202.1 LN:40103
@SQ SN:GL000203.1 LN:37498
@SQ SN:GL000204.1 LN:81310
@SQ SN:GL000205.1 LN:174588
@SQ SN:GL000206.1 LN:41001
@SQ SN:GL000207.1 LN:4262
@SQ SN:GL000208.1 LN:92689
@SQ SN:GL000209.1 LN:159169
@SQ SN:GL000210.1 LN:27682
@SQ SN:GL000211.1 LN:166566
@SQ SN:GL000212.1 LN:186858
@SQ SN:GL000213.1 LN:164239
@SQ SN:GL000214.1 LN:137718
@SQ SN:GL000215.1 LN:172545
@SQ SN:GL000216.1 LN:172294
@SQ SN:GL000217.1 LN:172149
@SQ SN:GL000218.1 LN:161147
@SQ SN:GL000219.1 LN:179198
@SQ SN:GL000220.1 LN:161802
@SQ SN:GL000221.1 LN:155397
@SQ SN:GL000222.1 LN:186861
@SQ SN:GL000223.1 LN:180455
@SQ SN:GL000224.1 LN:179693
@SQ SN:GL000225.1 LN:211173
@SQ SN:GL000226.1 LN:15008
@SQ SN:GL000227.1 LN:128374
@SQ SN:GL000228.1 LN:129120
@SQ SN:GL000229.1 LN:19913
@SQ SN:GL000230.1 LN:43691
@SQ SN:GL000231.1 LN:27386
@SQ SN:GL000232.1 LN:40652
@SQ SN:GL000233.1 LN:45941
@SQ SN:GL000234.1 LN:40531
@SQ SN:GL000235.1 LN:34474
@SQ SN:GL000236.1 LN:41934
@SQ SN:GL000237.1 LN:45867
@SQ SN:GL000238.1 LN:39939
@SQ SN:GL000239.1 LN:33824
@SQ SN:GL000240.1 LN:41933
@SQ SN:GL000241.1 LN:42152
@SQ SN:GL000242.1 LN:43523
@SQ SN:GL000243.1 LN:43341
@SQ SN:GL000244.1 LN:39929
@SQ SN:GL000245.1 LN:36651
@SQ SN:GL000246.1 LN:38154
@SQ SN:GL000247.1 LN:36422
@SQ SN:GL000248.1 LN:39786
@SQ SN:GL000249.1 LN:38502
@PG ID:STAR PN:STAR VN:STAR_2.4.2a CL:/lustre/beagle/djf604/software/STAR-STAR_2.4.2a/bin/Linux_x86_64/STAR --runThreadN 24 --genomeDir /lustre/beagle/djf604/reference/human/gencode19/STAR-ref-gencode19 --readFilesIn /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG_read1.trimmed.fastq.gz /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG_read2.trimmed.fastq.gz --readFilesCommand zcat --outFileNamePrefix /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG. --outSAMtype BAM SortedByCoordinate --outSAMattributes NH HI AS NM MD --outSAMunmapped Within --outFilterType BySJout --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --quantMode TranscriptomeSAM
@CO user command line: /lustre/beagle/djf604/software/STAR-STAR_2.4.2a/bin/Linux_x86_64/STAR --outFileNamePrefix /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG. --genomeDir /lustre/beagle/djf604/reference/human/gencode19/STAR-ref-gencode19 --readFilesIn /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG_read1.trimmed.fastq.gz /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG_read2.trimmed.fastq.gz --readFilesCommand zcat --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --outSAMunmapped Within --outSAMattributes NH HI AS NM MD --outFilterMismatchNoverReadLmax 0.04 --sjdbScore 1 --runThreadN 24 --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSA<
The GTF header looks like this:
#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
Hi, Thank you for that. Where did you obtain the gene GTF file? Could you check that the chromosome name in your gene GTF file follows the "chr[N]" nomenclature? Cheers.
Ah, is that the problem? It's from Ensembl's FTP site. What's an easy way to add the "chr" if you don't mind? edit: It does only have the number/letter(s) for clarification.
Unfortunately, that does cause problems, as it's trying to match the chromosome name of your alignment file to the chromosome name of the GTF. Might I also recommend checking the TE GTF file? I was under the impression that the GRCh37 TE GTF file should have the same nomenclature as your gene GTF file, but I might be wrong. If you wish to send me the gene GTF file (tam at cshl dot edu), I could try to convert it for you. Thanks.
The TE GTF file does in fact have the same nomenclature. I'll send them both through now.
Hi, You might want to send through just the gene GTF, as the TE one is huge (and we have it). I looked more closely into your BAM header, and it appears that you are using GENCODE's chromosome nomenclature, which is a combination of UCSC (chr[N]) and Ensembl nomenclature [for scaffolds]. That is why TE were annotated (they were matching the scaffolds), while the genes did not. If you don't really care about alignments on the assembled scaffolds, you can use the hg19_rmsk_TE.gtf file. I will process the gene GTF as soon as possible.
It appears I have another GTF file from GENCODE which does in fact have the "chr[N]" notation so I'll run TEcount again and let you know how it goes.
Hi, I have now built a TE GTF file that should work properly with GENCODE chromosome nomenclature. It is located here. Please let me know if you have any issues. Thanks.
Using the GENCODE file has worked, thanks for your help!
I've tried running TEcount on several paired-end samples and am getting effectively no counts for all genes but some counts for TEs. I know this should be happening as I have counts on these genes using featureCounts. How should I resolve this problem?
Clearly I should not only have 79 gene counts.