sandialabs / TIGER

Target / Integrative Genetic Element Retriever: precisely maps IGEs (a defined type of genomic island) in bacterial and archaeal genomes; package also includes orthogonal program Islander
Other
10 stars 3 forks source link

error at typing step #5

Closed alexweisberg closed 3 years ago

alexweisberg commented 3 years ago

Hi, I am excited to try your software pipeline as Islander has been very useful for my research, and this looks like a great improvement. I am trying the pipeline with the E. coli test dataset, and am getting an error on the final typing command:

Illegal division by zero at ../bin/typing.pl line 92.

the resolved command also has this message in the log: Use of uninitialized value $id in concatenation (.) or string at ../resolve.pl line 84, <IN> line 66.

and in the tiger.log file there were several lines similar to:

Island test for Y-Int.1

Running: perl /locationoffolder/Software/TIGER/testdata2/../bin/tigercore.pl genome.fa NC_000913.2 295582 ../blastdb/ecolidb.fna 2000 200000 250 15000 3000 island

blastdbcmd -db genome -dbtype nucl -entry NC_000913.2 -outfmt %l

Error: [blastdbcmd] Skipped BA000007.3

Warning: [blastn] Query is Empty!

Warning: [blastn] Query is Empty!

The main change between the recommended pipeline and software versions and what I ran is that the NCBI refseq_genomic is no longer available as a version 5 BLAST database, so I just used a custom local database of several NCBI E coli genomes. I've attached a compressed file containing the entire testdata folder. The script runtiger.sh contains the commands I used to run TIGER. testdata.tar.gz

As a possibly related question, is there a specific format that is needed for the headers in the blast database? I have a large dataset of genomes that I have sequenced that are not yet in NCBI that would be better than whats currently available there. Do I need to format them in a certain way to show the taxonomic information?

Finally, a feature request- I typically use prokka to annotate my assemblies already. Would it be possible to use them as input to TIGER to both save time as well as get consistent locus_tags/regions where the islands are integrated? Thank you!

alexweisberg commented 3 years ago

I was able to resolve the BLAST errors by remaking my blast database with the "-parse_seqids" flag. Now the pipeline runs without those errors.

I still see the occasional "Warning: [blastn] Query is Empty!" for some of the searches but many do run correctly.

The resolved.log file also has "Use of uninitialized value $id in concatenation (.) or string at ../resolve.pl line 84, line 66." at the top but seems to run correctly otherwise.