zhangrengang / TEsorter

TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes
https://doi.org/10.1093/hr/uhac017
GNU General Public License v3.0
87 stars 19 forks source link

Error when thread # is more than input sequence # #18

Closed oushujun closed 3 years ago

oushujun commented 3 years ago

Hi Rengang,

I encounter errors when specifying more threads than the number of input sequences. Can you help to take a look?

2020-10-19 04:38:27,617 -WARNING- exit code 1 for CMD 'hmmscan --notextw -E 0.01 --domE 0.01 --noali --domtblout ./tmp/chunk_aaseq.14.fasta.domtbl /opt/conda/lib/python3.7/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm ./tmp/chunk_aaseq.14.fasta' 2020-10-19 04:38:27,617 -WARNING- STDOUT: b'' STDERR: b'\nError: Sequence file ./tmp/chunk_aaseq.14.fasta is empty or misformatted\n\n'

2020-10-19 04:38:27,617 -WARNING- exit code 1 for CMD 'hmmscan --notextw -E 0.01 --domE 0.01 --noali --domtblout ./tmp/chunk_aaseq.15.fasta.domtbl /opt/conda/lib/python3.7/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm ./tmp/chunk_aaseq.15.fasta' 2020-10-19 04:38:27,617 -WARNING- STDOUT: b'' STDERR: b'\nError: Sequence file ./tmp/chunk_aaseq.15.fasta is empty or misformatted\n\n'

2020-10-19 04:38:27,618 -WARNING- exit code 1 for CMD 'hmmscan --notextw -E 0.01 --domE 0.01 --noali --domtblout ./tmp/chunk_aaseq.16.fasta.domtbl /opt/conda/lib/python3.7/site-packages/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm ./tmp/chunk_aaseq.16.fasta' 2020-10-19 04:38:27,618 -WARNING- STDOUT: b'' STDERR: b'\nError: Sequence file ./tmp/chunk_aaseq.16.fasta is empty or misformatted\n\n'

Best, Shujun

zhangrengang commented 3 years ago

Yes, some fasta will be empty. I will fix it.

zhangrengang commented 3 years ago

I filter out empty files by add chunk_files = [chunk_file for chunk_file in chunk_files if os.path.getsize(chunk_file)>0] in hmmscan_pp function in TEsorter.py.