mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Not counting genes #62

Closed tonilogbo closed 4 years ago

tonilogbo commented 4 years ago

I've tried running TEcount on several paired-end samples and am getting effectively no counts for all genes but some counts for TEs. I know this should be happening as I have counts on these genes using featureCounts. How should I resolve this problem?

TEcount --sortByPos --format BAM --stranded reverse --mode multi -b H-395-OLIG.Aligned.sortedByCoord.out.bam --GTF Homo_sapiens.GRCh37.87.gtf --TE GRCh37_rmsk_TE.gtf --project sample_sorted_test
INFO  @ Wed, 19 Feb 2020 10:26:18: 
# ARGUMENTS LIST:
# name = sample_sorted_test
# BAM file = H-395-OLIG.Aligned.sortedByCoord.out.bam
# GTF file = Homo_sapiens.GRCh37.87.gtf 
# TE file = GRCh37_rmsk_TE.gtf 
# multi-mapper mode = multi 
# stranded = reverse 
# number of iteration = 100
# Alignments grouped by read ID = False

INFO  @ Wed, 19 Feb 2020 10:26:18: Processing GTF files ... 

INFO  @ Wed, 19 Feb 2020 10:26:18: Building gene index ....... 

100000 GTF lines processed.
200000 GTF lines processed.
300000 GTF lines processed.
400000 GTF lines processed.
500000 GTF lines processed.
600000 GTF lines processed.
700000 GTF lines processed.
800000 GTF lines processed.
900000 GTF lines processed.
1000000 GTF lines processed.
1100000 GTF lines processed.
INFO  @ Wed, 19 Feb 2020 10:32:41: Done building gene index ...... 

INFO  @ Wed, 19 Feb 2020 10:32:53: Building TE index ....... 

INFO  @ Wed, 19 Feb 2020 10:37:38: Done building TE index ...... 

INFO  @ Wed, 19 Feb 2020 10:37:38: 
Reading sample file ... 

uniq te counts = 14158.0 
.......start iterative optimization .......... 
multi-reads = 2592 total means = 44.7017806836
after normalization total means0 = 1.0
SQUAREM iteraton [1]
1/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 47.6675267147
after normalization total means = 1.0
2/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.0125141651
after normalization total means = 1.0
alpha = 1.0, SQUAREM iteraton [2]
1/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.1316135947
after normalization total means = 1.0
2/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.1814151017
after normalization total means = 1.0
alpha = 1.61124232668.
 Performing a stabilization step.
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2247316346
after normalization total means = 1.0
alpha = 1.61124232668, SQUAREM iteraton [3]
1/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2284421452
after normalization total means = 1.0
2/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2306690048
after normalization total means = 1.0
alpha = 3.88215504293.
 Performing a stabilization step.
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2341622013
after normalization total means = 1.0
alpha = 3.88215504293, SQUAREM iteraton [4]
1/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2351881467
after normalization total means = 1.0
2/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2356658829
after normalization total means = 1.0
alpha = 4.0.
 Performing a stabilization step.
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2365781197
after normalization total means = 1.0
alpha = 4.0, SQUAREM iteraton [5]
1/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.237396658
after normalization total means = 1.0
2/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2377991454
after normalization total means = 1.0
alpha = 4.5058537256.
 Performing a stabilization step.
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2387637798
after normalization total means = 1.0
alpha = 4.5058537256, SQUAREM iteraton [6]
1/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2395791763
after normalization total means = 1.0
2/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2399246586
after normalization total means = 1.0
alpha = 4.55067693244.
 Performing a stabilization step.
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2402469027
after normalization total means = 1.0
alpha = 4.55067693244, SQUAREM iteraton [7]
1/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2411215562
after normalization total means = 1.0
2/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2414284078
after normalization total means = 1.0
alpha = 4.5409696762.
 Performing a stabilization step.
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2410991396
after normalization total means = 1.0
alpha = 4.5409696762, SQUAREM iteraton [8]
1/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2421064859
after normalization total means = 1.0
2/3
num of multi reads = 2592
total multi counts = 2179.0
total multi counts = 2179.0
total means = 48.2424053534
after normalization total means = 1.0
r2Nome = OPT_TOL 
converge at iteration 8
num of multi reads = 2592
total multi counts = 2179.0
TE counts total 16337.0 
Gene counts total 79.5 

In library H-395-OLIG.Aligned.sortedByCoord.out.bam: 
Total annotated reads = 16416.5 
Total non-uniquely mapped reads = 1869370 
Total unannotated reads = 26169892

Clearly I should not only have 79 gene counts.

olivertam commented 4 years ago

Hi, That is very unusual. Would you be able to show me the header of your BAM file, and maybe the first 5 lines of your gene GTF file? Thanks.

tonilogbo commented 4 years ago

The BAM header looks like this:

@HD VN:1.4  SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr19    LN:59128983
@SQ SN:chr20    LN:63025520
@SQ SN:chr21    LN:48129895
@SQ SN:chr22    LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chrM LN:16569
@SQ SN:GL877870.2   LN:66021
@SQ SN:GL877872.1   LN:297485
@SQ SN:GL383535.1   LN:429806
@SQ SN:JH159133.1   LN:266316
@SQ SN:KB663606.1   LN:305900
@SQ SN:KB021647.1   LN:1058686
@SQ SN:KE332497.1   LN:543325
@SQ SN:JH159131.1   LN:393769
@SQ SN:GL949745.1   LN:372609
@SQ SN:JH720447.1   LN:454385
@SQ SN:GL582968.1   LN:356330
@SQ SN:JH720446.1   LN:97345
@SQ SN:JH591181.2   LN:2281126
@SQ SN:JH720443.2   LN:408430
@SQ SN:JH159134.2   LN:3821770
@SQ SN:JH636052.4   LN:7283150
@SQ SN:JH636053.3   LN:1676126
@SQ SN:JH636054.1   LN:758378
@SQ SN:JH636056.1   LN:262912
@SQ SN:JH636058.1   LN:716227
@SQ SN:JH636057.1   LN:200195
@SQ SN:KE332505.1   LN:579598
@SQ SN:JH720451.1   LN:898979
@SQ SN:JH720452.1   LN:522319
@SQ SN:JH720453.1   LN:1461188
@SQ SN:JH720454.3   LN:752267
@SQ SN:JH159136.1   LN:200998
@SQ SN:JH806587.1   LN:4110759
@SQ SN:JH806588.1   LN:862483
@SQ SN:JH806589.1   LN:270630
@SQ SN:JH806590.2   LN:2418393
@SQ SN:JH806591.1   LN:882083
@SQ SN:JH806592.1   LN:835911
@SQ SN:JH806593.1   LN:389631
@SQ SN:JH806594.1   LN:390496
@SQ SN:JH806595.1   LN:444074
@SQ SN:JH806596.1   LN:413927
@SQ SN:JH806597.1   LN:1045622
@SQ SN:JH720448.1   LN:70483
@SQ SN:JH806598.1   LN:899320
@SQ SN:JH806599.1   LN:1214327
@SQ SN:JH806600.2   LN:6530008
@SQ SN:JH806601.1   LN:1389764
@SQ SN:JH806602.1   LN:713266
@SQ SN:JH806573.1   LN:24680
@SQ SN:JH806574.2   LN:22982
@SQ SN:JH806575.1   LN:47409
@SQ SN:JH806580.1   LN:93149
@SQ SN:JH806583.1   LN:167183
@SQ SN:JH806584.1   LN:70876
@SQ SN:JH806585.1   LN:73505
@SQ SN:JH806603.1   LN:182949
@SQ SN:JH159150.3   LN:3110903
@SQ SN:GL582969.1   LN:251823
@SQ SN:JH806577.1   LN:22394
@SQ SN:JH806578.1   LN:169437
@SQ SN:JH806579.1   LN:211307
@SQ SN:JH159137.1   LN:191409
@SQ SN:KE332502.1   LN:341712
@SQ SN:KB021645.1   LN:1523386
@SQ SN:KB663607.2   LN:334922
@SQ SN:KE332500.1   LN:228602
@SQ SN:KE332496.1   LN:503215
@SQ SN:GL383558.1   LN:457041
@SQ SN:GL582976.1   LN:412535
@SQ SN:GL383523.1   LN:171362
@SQ SN:KE332498.1   LN:149443
@SQ SN:GL949743.1   LN:608579
@SQ SN:KE332506.1   LN:307252
@SQ SN:GL383536.1   LN:203777
@SQ SN:JH591184.1   LN:462282
@SQ SN:JH636061.1   LN:186059
@SQ SN:JH806576.1   LN:273386
@SQ SN:GL383524.1   LN:78793
@SQ SN:GL582973.1   LN:321004
@SQ SN:JH159138.1   LN:108875
@SQ SN:KB021648.1   LN:469972
@SQ SN:JH159139.1   LN:120441
@SQ SN:JH159140.1   LN:546435
@SQ SN:JH591182.1   LN:196262
@SQ SN:JH159132.1   LN:100694
@SQ SN:JH720449.1   LN:212298
@SQ SN:JH591183.1   LN:177920
@SQ SN:JH720444.2   LN:273128
@SQ SN:JH159141.2   LN:240775
@SQ SN:KB663604.1   LN:478993
@SQ SN:JH720455.1   LN:65034
@SQ SN:KB021646.2   LN:211416
@SQ SN:JH159142.2   LN:326647
@SQ SN:JH159143.1   LN:191402
@SQ SN:JH806582.2   LN:342635
@SQ SN:JH159135.2   LN:102251
@SQ SN:KE332499.1   LN:274521
@SQ SN:GL877877.2   LN:284527
@SQ SN:JH806586.1   LN:43543
@SQ SN:GL582979.2   LN:179899
@SQ SN:KB663605.1   LN:155926
@SQ SN:GL582975.1   LN:34662
@SQ SN:GL949744.1   LN:276448
@SQ SN:GL383543.1   LN:392792
@SQ SN:GL877871.1   LN:389939
@SQ SN:GL582967.1   LN:248177
@SQ SN:JH159149.1   LN:245473
@SQ SN:GL582977.2   LN:580393
@SQ SN:GL582970.1   LN:354970
@SQ SN:GL383559.2   LN:338640
@SQ SN:JH159144.1   LN:388340
@SQ SN:JH591186.1   LN:376223
@SQ SN:GL383560.1   LN:534288
@SQ SN:GL339450.1   LN:330164
@SQ SN:GL582971.1   LN:1284284
@SQ SN:GL582974.1   LN:163298
@SQ SN:JH806581.1   LN:872115
@SQ SN:JH636060.1   LN:437946
@SQ SN:JH591185.1   LN:167437
@SQ SN:JH159145.1   LN:194862
@SQ SN:GL877873.1   LN:168465
@SQ SN:KB663608.1   LN:283551
@SQ SN:GL582972.1   LN:327774
@SQ SN:KB663603.1   LN:599580
@SQ SN:KE332495.1   LN:263861
@SQ SN:JH636059.1   LN:295379
@SQ SN:JH720445.1   LN:170033
@SQ SN:KE332501.1   LN:1020827
@SQ SN:GL383561.2   LN:644425
@SQ SN:GL949741.1   LN:151551
@SQ SN:GL383562.1   LN:45551
@SQ SN:GL383525.1   LN:65063
@SQ SN:GL383544.1   LN:128378
@SQ SN:GL383548.1   LN:165247
@SQ SN:GL383537.1   LN:62435
@SQ SN:GL383538.1   LN:49281
@SQ SN:GL383516.1   LN:49316
@SQ SN:GL383517.1   LN:49352
@SQ SN:GL383545.1   LN:179254
@SQ SN:GL383546.1   LN:309802
@SQ SN:GL383547.1   LN:154407
@SQ SN:GL877875.1   LN:167313
@SQ SN:GL383549.1   LN:120804
@SQ SN:GL383550.1   LN:169178
@SQ SN:GL383551.1   LN:184319
@SQ SN:GL877876.1   LN:408271
@SQ SN:GL383552.1   LN:138655
@SQ SN:GL383553.2   LN:152874
@SQ SN:GL383554.1   LN:296527
@SQ SN:GL383555.1   LN:388773
@SQ SN:GL383556.1   LN:192462
@SQ SN:GL383557.1   LN:89672
@SQ SN:GL000258.1   LN:1680828
@SQ SN:GL383563.2   LN:270261
@SQ SN:GL383564.1   LN:133151
@SQ SN:GL383565.1   LN:223995
@SQ SN:GL383566.1   LN:90219
@SQ SN:JH159146.1   LN:278131
@SQ SN:JH159147.1   LN:70345
@SQ SN:JH159148.1   LN:88070
@SQ SN:GL383567.1   LN:289831
@SQ SN:GL383568.1   LN:104552
@SQ SN:GL383569.1   LN:167950
@SQ SN:GL383570.1   LN:164789
@SQ SN:GL383571.1   LN:198278
@SQ SN:GL383572.1   LN:159547
@SQ SN:GL949746.1   LN:987716
@SQ SN:GL949747.1   LN:729519
@SQ SN:GL949748.1   LN:1064303
@SQ SN:GL949749.1   LN:1091840
@SQ SN:GL949750.1   LN:1066389
@SQ SN:GL949751.1   LN:1002682
@SQ SN:GL949752.1   LN:987100
@SQ SN:GL949753.1   LN:796478
@SQ SN:GL383573.1   LN:385657
@SQ SN:GL383574.1   LN:155864
@SQ SN:GL383575.2   LN:170222
@SQ SN:GL383576.1   LN:188024
@SQ SN:GL383518.1   LN:182439
@SQ SN:GL383519.1   LN:110268
@SQ SN:GL383520.1   LN:366579
@SQ SN:GL383577.1   LN:128385
@SQ SN:GL383578.1   LN:63917
@SQ SN:GL383579.1   LN:201198
@SQ SN:GL383580.1   LN:74652
@SQ SN:GL383581.1   LN:116690
@SQ SN:GL383582.2   LN:162811
@SQ SN:GL383583.1   LN:96924
@SQ SN:KB663609.1   LN:74013
@SQ SN:GL383521.1   LN:143390
@SQ SN:GL383522.1   LN:123821
@SQ SN:GL582966.2   LN:96131
@SQ SN:JH636055.1   LN:173151
@SQ SN:GL383526.1   LN:180671
@SQ SN:GL000257.1   LN:590426
@SQ SN:GL383527.1   LN:164536
@SQ SN:GL383528.1   LN:376187
@SQ SN:GL383529.1   LN:121345
@SQ SN:GL339449.2   LN:1612928
@SQ SN:GL383530.1   LN:101241
@SQ SN:GL383531.1   LN:173459
@SQ SN:GL383532.1   LN:82728
@SQ SN:GL949742.1   LN:226852
@SQ SN:GL383533.1   LN:124736
@SQ SN:KB021644.1   LN:187824
@SQ SN:GL000250.1   LN:4622290
@SQ SN:GL000251.1   LN:4795371
@SQ SN:GL000252.1   LN:4610396
@SQ SN:GL000253.1   LN:4683263
@SQ SN:GL000254.1   LN:4833398
@SQ SN:GL000255.1   LN:4611984
@SQ SN:GL000256.1   LN:4928567
@SQ SN:GL383534.2   LN:119183
@SQ SN:GL383539.1   LN:162988
@SQ SN:GL383540.1   LN:71551
@SQ SN:GL383541.1   LN:171286
@SQ SN:GL383542.1   LN:60032
@SQ SN:GL000191.1   LN:106433
@SQ SN:GL000192.1   LN:547496
@SQ SN:GL000193.1   LN:189789
@SQ SN:GL000194.1   LN:191469
@SQ SN:GL000195.1   LN:182896
@SQ SN:GL000196.1   LN:38914
@SQ SN:GL000197.1   LN:37175
@SQ SN:GL000198.1   LN:90085
@SQ SN:GL000199.1   LN:169874
@SQ SN:GL000200.1   LN:187035
@SQ SN:GL000201.1   LN:36148
@SQ SN:GL000202.1   LN:40103
@SQ SN:GL000203.1   LN:37498
@SQ SN:GL000204.1   LN:81310
@SQ SN:GL000205.1   LN:174588
@SQ SN:GL000206.1   LN:41001
@SQ SN:GL000207.1   LN:4262
@SQ SN:GL000208.1   LN:92689
@SQ SN:GL000209.1   LN:159169
@SQ SN:GL000210.1   LN:27682
@SQ SN:GL000211.1   LN:166566
@SQ SN:GL000212.1   LN:186858
@SQ SN:GL000213.1   LN:164239
@SQ SN:GL000214.1   LN:137718
@SQ SN:GL000215.1   LN:172545
@SQ SN:GL000216.1   LN:172294
@SQ SN:GL000217.1   LN:172149
@SQ SN:GL000218.1   LN:161147
@SQ SN:GL000219.1   LN:179198
@SQ SN:GL000220.1   LN:161802
@SQ SN:GL000221.1   LN:155397
@SQ SN:GL000222.1   LN:186861
@SQ SN:GL000223.1   LN:180455
@SQ SN:GL000224.1   LN:179693
@SQ SN:GL000225.1   LN:211173
@SQ SN:GL000226.1   LN:15008
@SQ SN:GL000227.1   LN:128374
@SQ SN:GL000228.1   LN:129120
@SQ SN:GL000229.1   LN:19913
@SQ SN:GL000230.1   LN:43691
@SQ SN:GL000231.1   LN:27386
@SQ SN:GL000232.1   LN:40652
@SQ SN:GL000233.1   LN:45941
@SQ SN:GL000234.1   LN:40531
@SQ SN:GL000235.1   LN:34474
@SQ SN:GL000236.1   LN:41934
@SQ SN:GL000237.1   LN:45867
@SQ SN:GL000238.1   LN:39939
@SQ SN:GL000239.1   LN:33824
@SQ SN:GL000240.1   LN:41933
@SQ SN:GL000241.1   LN:42152
@SQ SN:GL000242.1   LN:43523
@SQ SN:GL000243.1   LN:43341
@SQ SN:GL000244.1   LN:39929
@SQ SN:GL000245.1   LN:36651
@SQ SN:GL000246.1   LN:38154
@SQ SN:GL000247.1   LN:36422
@SQ SN:GL000248.1   LN:39786
@SQ SN:GL000249.1   LN:38502
@PG ID:STAR PN:STAR VN:STAR_2.4.2a  CL:/lustre/beagle/djf604/software/STAR-STAR_2.4.2a/bin/Linux_x86_64/STAR   --runThreadN 24   --genomeDir /lustre/beagle/djf604/reference/human/gencode19/STAR-ref-gencode19   --readFilesIn /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG_read1.trimmed.fastq.gz   /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG_read2.trimmed.fastq.gz      --readFilesCommand zcat      --outFileNamePrefix /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG.   --outSAMtype BAM   SortedByCoordinate      --outSAMattributes NH   HI   AS   NM   MD      --outSAMunmapped Within   --outFilterType BySJout   --outFilterMultimapNmax 20   --outFilterMismatchNmax 999   --outFilterMismatchNoverReadLmax 0.04   --alignIntronMin 20   --alignIntronMax 1000000   --alignMatesGapMax 1000000   --alignSJoverhangMin 8   --alignSJDBoverhangMin 1   --sjdbScore 1   --quantMode TranscriptomeSAM   
@CO user command line: /lustre/beagle/djf604/software/STAR-STAR_2.4.2a/bin/Linux_x86_64/STAR --outFileNamePrefix /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG. --genomeDir /lustre/beagle/djf604/reference/human/gencode19/STAR-ref-gencode19 --readFilesIn /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG_read1.trimmed.fastq.gz /lustre/beagle/djf604/projects/PEC/analysis/Freeze2/EpiGABA/H-395-OLIG/H-395-OLIG_read2.trimmed.fastq.gz --readFilesCommand zcat --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --outSAMunmapped Within --outSAMattributes NH HI AS NM MD --outFilterMismatchNoverReadLmax 0.04 --sjdbScore 1 --runThreadN 24 --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSA<

The GTF header looks like this:

#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
olivertam commented 4 years ago

Hi, Thank you for that. Where did you obtain the gene GTF file? Could you check that the chromosome name in your gene GTF file follows the "chr[N]" nomenclature? Cheers.

tonilogbo commented 4 years ago

Ah, is that the problem? It's from Ensembl's FTP site. What's an easy way to add the "chr" if you don't mind? edit: It does only have the number/letter(s) for clarification.

olivertam commented 4 years ago

Unfortunately, that does cause problems, as it's trying to match the chromosome name of your alignment file to the chromosome name of the GTF. Might I also recommend checking the TE GTF file? I was under the impression that the GRCh37 TE GTF file should have the same nomenclature as your gene GTF file, but I might be wrong. If you wish to send me the gene GTF file (tam at cshl dot edu), I could try to convert it for you. Thanks.

tonilogbo commented 4 years ago

The TE GTF file does in fact have the same nomenclature. I'll send them both through now.

olivertam commented 4 years ago

Hi, You might want to send through just the gene GTF, as the TE one is huge (and we have it). I looked more closely into your BAM header, and it appears that you are using GENCODE's chromosome nomenclature, which is a combination of UCSC (chr[N]) and Ensembl nomenclature [for scaffolds]. That is why TE were annotated (they were matching the scaffolds), while the genes did not. If you don't really care about alignments on the assembled scaffolds, you can use the hg19_rmsk_TE.gtf file. I will process the gene GTF as soon as possible.

tonilogbo commented 4 years ago

It appears I have another GTF file from GENCODE which does in fact have the "chr[N]" notation so I'll run TEcount again and let you know how it goes.

olivertam commented 4 years ago

Hi, I have now built a TE GTF file that should work properly with GENCODE chromosome nomenclature. It is located here. Please let me know if you have any issues. Thanks.

tonilogbo commented 4 years ago

Using the GENCODE file has worked, thanks for your help!