mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

AttributeError: 'Namespace' object has no attribute 'format' #135

Closed Dexter5577 closed 1 year ago

Dexter5577 commented 1 year ago

Hi: Thank you for wonderful tool. When I configured the environment for Python 2.7.11 on the server and downloaded the software to run it, I encountered the following issue with my code. I would be extremely grateful if you could provide an answer.

/share/pub1/python/Python-2.7.11/bin/python /share/pub1/TEtranscripts/TEtranscripts-master/bin/TEtranscripts \ --stranded no \ --GTF /share/pub1/HERV/package-entities-erv.gff3 \ --mode multi \ --outdir /share/pub1/TE \ --TE /share/pub1/HERV/package-entities-erv.fa \ --treatment /share/pub1/STAR/human2/R17004740LR01Aligned.out.bam \ --control /share/pub1/STAR/human2/R17001923LR01Aligned.out.bam

Traceback (most recent call last):

File "/share/pub1/lijq/lijq/TEtranscripts/TEtranscripts-master/bin/TEtranscripts", line 866, in main() File "/share/pub1/lijq/lijq/TEtranscripts/TEtranscripts-master/bin/TEtranscripts", line 748, in main args = read_opts2(prepare_parser()) File "/share/pub1/lijq/lijq/python/Python-2.7.11/lib/python2.7/site-packages/TEToolkit/IO/ReadInputs.py", line 262, in read_opts2 if args.format == "BAM" : AttributeError: 'Namespace' object has no attribute 'format'

olivertam commented 1 year ago

Hi,

Thank you for your interest in the software. Based on your command line, it appears that you're trying to run TEtranscripts directly from source. We would recommend installing the software

$ /share/pub1/python/Python-2.7.11/bin/python setup.py install --prefix [a location on your PYTHONPATH]

Also, you need to provide a GTF for both the gene annotation (--GTF) and TE annotation (--TE). You cannot use FASTA sequences, as the tool operates on genomic alignments.

$ TEtranscripts --stranded no \
--GTF [Gene GTF] \
 --mode multi \ 
--outdir /share/pub1/TE \
 --TE [TE GTF] \
 --treatment /share/pub1/STAR/human2/R17004740LR01Aligned.out.bam \
 --control /share/pub1/STAR/human2/R17001923LR01Aligned.out.bam

If you are using a custom annotation (with genomic coordinates corresponding to your genome of your BAM), we can try to help you get that into a format that works for TEtranscripts. Please let us know if that does not resolve your issue.

Thanks.

Dexter5577 commented 1 year ago

Hi! I'm sorry to bother you again. When I ran the code after fixing the previous error, another error occurred. 1600000 GTF lines processed. INFO @ Mon, 10 Apr 2023 10:17:14: Done building gene index ...... INFO @ Mon, 10 Apr 2023 10:17:31: Building TE index ....... chr1 /share/pub1/lijq/lijq/HERV/HERV_elements.genePred CDS 21975 22093 . + 0 gene_id "168165:ERV:MLT1K"; transcript_id "168165:ERV:MLT1K"; exon_number "1"; exon_id "168165:ERV:MLT1K.1"; family_id "ERV3"; class_id "MaLR"; TE GTF format error! There is no annotation at line 1. Error in building TE index GTF:

chr1 /share/pub1/lijq/lijq/HERV/HERV_elements.genePred transcript 34597 34659 . - . gene_id "12:ERV:MLT1L"; transcript_id "12:ERV:MLT1L"; family_id "ERV3"; class_id "MaLR"; chr1 /share/pub1/lijq/lijq/HERV/HERV_elements.genePred transcript 21975 22093 . + . gene_id "168165:ERV:MLT1K"; transcript_id "168165:ERV:MLT1K"; family_id "ERV3"; class_id "MaLR"; chr1 /share/pub1/lijq/lijq/HERV/HERV_elements.genePred exon 21975 22093 . + . gene_id "168165:ERV:MLT1K"; transcript_id "168165:ERV:MLT1K"; exon_number "1"; exon_id "168165:ERV:MLT1K.1"; family_id "ERV3"; class_id "MaLR"; chr1 /share/pub1/lijq/lijq/HERV/HERV_elements.genePred CDS 21975 22093 . + 0 gene_id "168165:ERV:MLT1K"; transcript_id "168165:ERV:MLT1K"; exon_number "1"; exon_id "168165:ERV:MLT1K.1"; family_id "ERV3"; class_id "MaLR"; chr1 /share/pub1/lijq/lijq/HERV/HERV_elements.genePred start_codon 21975 21977 . + 0 gene_id "168165:ERV:MLT1K"; transcript_id "168165:ERV:MLT1K"; exon_number "1"; exon_id "168165:ERV:MLT1K.1"; family_id "ERV3"; class_id "MaLR";

This is my own GTF file. HERV_elements.zip Could you please advise me on how to solve it? Thank you once again.Looking forward to your reply.

olivertam commented 1 year ago

Hi,

There were a couple of issues with your GTF file: 1) The attributes field (column 9 of GTF format) were tab-separated instead of space-separated. In some cases, there were multiple tabs separating the values, which prevented the correct parsing. 2) There were lines where the attributes had quotation marks around them. E.g.:

chr16   /share/pub1/lijq/lijq/HERV/HERV_elements.genePred       exon    41014   41505   .       +       .       "gene_id ""1319313:ERV:HERV15,LTR15"";" "transcript_id ""1319313:ERV:HERV15,LTR15"";"   exon_number "1";        "exon_id ""1319313:ERV:HERV15,LTR15.1"";"       family_id "NA"; class_id "NA";

This also caused problems when parsing. I also noted that the family and class ID were not provided correctly, but I did not fix them as I wasn't sure if that was intentional.

Once these were fixed, the TE GTF could be parsed by TEtranscripts.

I have attached the minimally fixed GTF here, though you might want to look through it to find lines where some of the attribute information was absent.

Please let us know if you encounter further issues.

Thanks.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days