mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
217 stars 29 forks source link

Error in building TE index #112

Closed zhangqc723 closed 2 years ago

zhangqc723 commented 2 years ago

Dear, I make a gtf file for downstream analysis as below, but I get a error when I use TEtranscripts "Error in building TE index". And how do I solve it?

chr1 mm10_wgbs exon 3000001 3000097 . - . gene_id "L1MdFanc_I"; transcript_id "L1MdFanc_I_dup1"; family_id "L1"; class_id "LINE"; chr1 mm10_wgbs exon 3000098 3000123 . + . gene_id "(T)n"; transcript_id "(T)n_dup1"; family_id "Simple_repeat"; class_id "Simple_repeat"; chr1 mm10_wgbs exon 3000124 3002128 . - . gene_id "L1MdFanc_I"; transcript_id "L1MdFanc_I_dup2"; family_id "L1"; class_id "LINE"; chr1 mm10_wgbs exon 3003148 3004054 . - . gene_id "L1MdFanc_I"; transcript_id "L1MdFanc_I_dup3"; family_id "L1"; class_id "LINE"; chr1 mm10_wgbs exon 3004041 3004206 . + . gene_id "L1_Rod"; transcript_id "L1_Rod_dup1"; family_id "L1"; class_id "LINE"; chr1 mm10_wgbs exon 3004207 3004270 . + . gene_id "(ACAA)n"; transcript_id "(ACAA)n_dup1"; family_id "Simple_repeat"; class_id "Simple_repeat"; chr1 mm10_wgbs exon 3004271 3005001 . + . gene_id "L1_Rod"; transcript_id "L1_Rod_dup2"; family_id "L1"; class_id "LINE"; chr1 mm10_wgbs exon 3005002 3005441 . + . gene_id "L1_Rod"; transcript_id "L1_Rod_dup3"; family_id "L1"; class_id "LINE"; chr1 mm10_wgbs exon 3005461 3006764 . + . gene_id "Lx9"; transcript_id "Lx9_dup1"; family_id "L1"; class_id "LINE"; chr1 mm10_wgbs exon 3006791 3006841 . + . gene_id "A-rich"; transcript_id "A-rich_dup1"; family_id "Low_complexity"; class_id "Low_complexity";

zhangqc723 commented 2 years ago

and I make a gtf file again by your "makeTEgtf.pl".But, I get same error.

olivertam commented 2 years ago

Hi,

Could you provide either the GTF or the input file you used? I am unable to immediately identify the error from your excerpt, but one thing to check is that the values in columns 4 and 5 (the start and end) should be numeric, and not in scientific notation. That is known to cause an error.

Thanks.

zhangqc723 commented 2 years ago

Thank you for your prompt reply. As you said, the values in columns 4 and 5 (the start and end) is scientific notation in my input GTF. Now the error was solved with your help. Thank you again. @olivertam

zhangqc723 commented 2 years ago

But now, I get a new error. [E::idx_find_and_load] Could not retrieve index file for './01_star/Ni-Cko_M_B3-1/Ni-Cko_M_B3-1_Aligned.nsort.bam'. I guess this error caused by not bai file. But samtools index can't create a index for bam file sorted by name.

olivertam commented 2 years ago

Hi,

This is a warning message from samtools/pysam that appears to have no impact on TEtranscripts. See #82.

Thanks.

zhangqc723 commented 2 years ago

Thanks. Now, I have got my result. Thank you again.