oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
179 stars 40 forks source link

LTR_retriever aborts, Illegal character error. #1

Closed mcsimenc closed 7 years ago

mcsimenc commented 7 years ago

Hi! I'm excited to use LTR_retriever. I ran LTR_retriever using the following call:

LTR_retriever \
    -genome Salvinia_cucullata_v1.1.fa \
    -inharvest LTRHarvest.out \
    -linelib LINEs.viridiplantae.fa \
    -dnalib DNA_TEs.viridiplantae.fa \
    -TEhmm Dfam.hmm \
    -threads 40 \
    1>LTR_retriever.out
    2>LTR_retriever.err

And it ran for a little while then aborted and reported these errors. There is no char E in the input fasta for -genome and the field *.scn.extend.fa.aa seems to have been removed by LTR_retriever. What could the problem be?

Parse failed (sequence file Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa.aa):
Line 2: illegal character E

Attempt to free unreferenced scalar: SV 0x6e2310, Perl interpreter: 0x6de7e0.
Use of uninitialized value $list[0] in pattern match (m//) at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 76.
Use of uninitialized value in split at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 79.
Use of uninitialized value in pattern match (m//) at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 79.
Use of uninitialized value $chr_pre in hash element at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 81.
Use of uninitialized value within %genome in length at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 81.
Use of uninitialized value $list[0] in pattern match (m//) at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 76.
Use of uninitialized value in split at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 79.
Use of uninitialized value in pattern match (m//) at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 79.
Use of uninitialized value $chr_pre in hash element at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 81.
Use of uninitialized value within %genome in length at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 81.
No such file or directory at /home/joshd/software/LTR_retriever/bin/cleanup.pl line 50.
LOC list is empty.
Warning: [blastx] Query is Empty!
LOC list is empty.
No such file or directory at /home/joshd/software/LTR_retriever/bin/cleanup.pl line 50.

#usage: $ perl output_by_list.pl DB_index_pos database LS_index_pos LIST [Exclusive]* [MSU_format] [FASTA_format] [version]> outfile
        * [] parameters are optional. 
                [Exclusive] -ex means exclude the entries in list, default is output the entries in list. 
                [MSU_format] -MSU0 means MSU_LOC occurs in the list file, while -MSU1 means MSU_LOC occurs in the database file
                eg. perl output_by_list.pl 1 Chr1.ltrTE.RMlist 1 Chr1.ltrTE.true.list -MSU0 -FA > Chr1.ltrTE.true.RMlist
rm: cannot remove `Salvinia_cucullata_v1.1.fa.cat.gz': No such file or directory
rm: cannot remove `Salvinia_cucullata_v1.1.fa.LTRlib.fa.n*': No such file or directory
rm: cannot remove `Salvinia_cucullata_v1.1.fa.nmtf': No such file or directory
perl annotate_gff.pl lib.fa gff > anno.gff
mcsimenc commented 7 years ago

Some additional info:

Some nonempty files were generated.

oushujun commented 7 years ago

Hi, Thanks for using LTR_retriever and providing the above messages. The parse failure is reported by hmmsearch. You can use -v to retain the intermediate files. Please send me the Salvinia_cucullata_v1.1.fa.retriever.scn.extend.fa.aa if possible. If you see a lot of empty files, something is wrong. i.e., there is not a lot of LTR founded in the genome.

Best, Shujun

mcsimenc commented 7 years ago

Hey Shujun, I think I found the problem. For -TEhmm I was using the Dfam profile HMM database of repeats, which has a lot of nice profiles, but they are nucleotide profile HMMs. I am re-running again now with a few Pfam profiles.

mcsimenc commented 7 years ago

Ok it still aborted without completing I think. This time I used the call:

LTR_retriever \
    -genome Salvinia_cucullata_v1.1.fa \
    -inharvest input/LTRHarvest.out \
    -linelib input/LINEs.viridiplantae.fa \
    -dnalib input/DNA_TEs.viridiplantae.fa \
    -TEhmm /home/derstudent/data/other/pfam-hmms/retrotransposon/combined.hmm \
    -threads 40 \
    -v \
    1>LTR_retriever.out
    2>LTR_retriever.err

and got the error:

Attempt to free unreferenced scalar: SV 0x1a6a310, Perl interpreter: 0x1a667e0.
Use of uninitialized value $list[0] in pattern match (m//) at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 76.
Use of uninitialized value in split at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 79.
Use of uninitialized value in pattern match (m//) at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 79.
Use of uninitialized value $chr_pre in hash element at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 81.
Use of uninitialized value within %genome in length at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 81.
Use of uninitialized value $list[0] in pattern match (m//) at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 76.
Use of uninitialized value in split at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 79.
Use of uninitialized value in pattern match (m//) at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 79.
Use of uninitialized value $chr_pre in hash element at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 81.
Use of uninitialized value within %genome in length at /home/joshd/software/LTR_retriever/bin/call_seq_by_list.pl line 81.
No such file or directory at /home/joshd/software/LTR_retriever/bin/cleanup.pl line 50.
LOC list is empty.
Warning: [blastx] Query is Empty!
LOC list is empty.
No such file or directory at /home/joshd/software/LTR_retriever/bin/cleanup.pl line 50.
perl annotate_gff.pl lib.fa gff > anno.gff
oushujun commented 7 years ago

Hi,

It's not suggestive to replace the TE.hmm file because they are hand-categorized. If you want to use Dfam you need to know which Dfam entry belongs to LTR which are not, and change the annotate_TE.pl script accordingly. I am updating the program and hopefully it will provide more status messages for users to learn about the process.

Shujun

mcsimenc commented 7 years ago

Ok thank you! The source of the illegal character error seems to be Dfam.hmm. I don't think I can use any Dfam profile HMMs because they don't have transition probabilities for amino acids. Unless there is a different version of the Dfam database that has protein profile HMMs.

Edit: I didn't realize LTR_retriever came with a profile HMM database (LTR_retriever/database/TEfam.hmm).

oushujun commented 7 years ago

Hi, Yes, the package came with a carefully selected pHMM file. You can provide a new one but need further modification into the script. So I suggest you use the TEfam.hmm unless you have a good reason to use others. I have pushed a new release to GitHub, please use this version of LTR_retriever. It comes with more user-friendly status messages which can help to keep track of the program. Just save your paths file and replace everything else. Let me know if you still have difficulties to run it.

Shujun

mcsimenc commented 7 years ago

Hi Shujun,

The new release seems to not be recognizing the location of the external dependencies. I think it is this line, 193, but I am not fluent in perl and don't know what the unless -X "string" test does. I added some code and made sure that makeblastdb is in the path held by $blastplus and it is, and makeblastdb can execute from within the code. Do you know the problem? (and could you explain unless -X if you have a minute :-)

die "makeblastdb is not exist in the BLAST+ path $blastplus!\n" unless -X "${blastplus}makeblastdb";

oushujun commented 7 years ago

Hi,

This usually occurs when the path is not ended with a slash. You can hardcode the path at the beginning of the program actually. If possible please send me your path file and I can test for you.

Shujun

On May 30, 2017 2:30 PM, "mcsimenc" notifications@github.com wrote:

Hi Shujun,

The new release seems to not be recognizing the location of the external dependencies. I think it is this line, 193, but I am not fluent in perl and don't know what the unless -X "string" test does. I added some code and made sure that makeblastdb is in the path held by $blastplus and it is, and makeblastdb can execute from within the code. Do you know the problem? (and could you explain unless -X if you have a minute :-)

die "makeblastdb is not exist in the BLAST+ path $blastplus!\n" unless -X "${blastplus}makeblastdb";

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/oushujun/LTR_retriever/issues/1#issuecomment-304967248, or mute the thread https://github.com/notifications/unsubscribe-auth/AFt-NNRZw8PfMZGhx_YUBYwSUncmmsvlks5r_GBAgaJpZM4Noz3f .

mcsimenc commented 7 years ago

Hmm I tried all three of these:

BLAST+=/share/apps/genomics/ncbi-blast-2.2.31+/bin/ BLAST+=/share/apps/genomics/ncbi-blast-2.2.31+/bin BLAST+= # with makeblastdb in PATH

and each time it gives this error, with the trailing forward slash:

makeblastdb is not exist in the BLAST+ path /share/apps/genomics/ncbi-blast-2.2.31+/bin/!

oushujun commented 7 years ago

Hi,

I fixed the bug and make the path reading module more sturdy. It should work with all three forms now.

Shujun

mcsimenc commented 7 years ago

OK, the program runs for a little while and now gives this error:

ERROR: No such file or directory at /home/joshd/software/LTR_retriever/bin/cleanup.pl line 50.
ERROR: No such file or directory at /home/joshd/software/LTR_retriever/bin/cleanup.pl line 50.

Fatal Error:
Failed to open the database file
Program halted !!

Can't open Salvinia_cucullata_v1.1.fa.LTRlib.clust: No such file or directory.
ERROR: This script is written to convert fasta files into a prettier format. 
Usage: fasta-reformat.pl input-fasta-file number-of-positions-per-line
ERROR: No such file or directory at /home/joshd/software/LTR_retriever/bin/annotate_gff.pl line 12.

Salvinia_cucullata_v1.1.fa.LTRlib.clust doesn't exist, and I ran LTR_retriever with -v

oushujun commented 7 years ago

Hi, I moved your last comment to a new thread because this is a different issue.

Shujun