oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
176 stars 40 forks source link

awk: cannot open N02.fa.retriever.scn.extend.fa.rexdb.cls.tsv #155

Closed renmiaozhen closed 6 months ago

renmiaozhen commented 11 months ago

When I run the LTR_retriever : /software/LTR_retriever-2.9.5/LTR_retriever -genome N02.fa -inharvest N02.fa.rawLTR.scn -threads 10 -u 7.3e-9, I receive some warning information:

  1. awk: cannot open N02.fa.retriever.scn.extend.fa.rexdb.cls.tsv (No such file or directory);
  2. Command line argument error: Argument "query". File is not accessible: 'N02.fa.ltrTE.stg3.cln' cat: N02.fa.ltrTE.stg3.line.out: No such file or directory cat: N02.fa.ltrTE.stg3.dna.out: No such file or directory cp: cannot stat 'N02.fa.ltrTE.stg3.cln': No such file or directory Command line argument error: Argument "query". File is not accessible: `N02.fa.ltrTE.stg3.cln.clean' ERROR: Please specify the BLAST result!
  3. ERROR: No such file or directory at /software/LTR_retriever-2.9.5/bin/output_by_list.pl line 37. ERROR: No such file or directory at /software/LTR_retriever-2.9.5/bin/output_by_list.pl line 37. Wed 12 Jul 2023 11:19:12 AM CST Retained clean sequence: 0
  4. ERROR: 6493 intact LTR-RTs have found, but the pre-library file N02.fa.ltrTE is empty;
  5. awk: cannot open N02.fa.retriever.scn.extend.fa.rexdb.cls.tsv (No such file or directory)

Although received so much error information, the final result "N02.fa.pass.list" was not empty. Can I use this result? Do any suggestions give me?

oushujun commented 11 months ago

Hello,

Something is wrong. Can you share the original status output? It will be helpful to identify the error.

Thanks, Shujun

renmiaozhen commented 11 months ago

I'm sorry for not getting back to you sooner. I run several tasks, the following information is one of error info. Looking forward to your reply, thank you.

My complete workflow is :

1. gt suffixerator -db genome.fa -indexname genome.fa -dna -suf -lcp
2. gt ltrharvest -index genome.fa -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -seqids yes > genome.fa.harvest.scn
3. LTR_FINDER_parallel -seq genome.fa -threads 10 -harvest_out
4. cat genome.fa.harvest.scn  genome.fa.finder.combine.scn > genome.fa.rawLTR.scn
5. LTR_retriever -genome genome.fa -inharvest genome.fa.rawLTR.scn -threads 10 -u 7.3e-9

##########################

LTR_retriever v2.9.5

##########################

Contributors: Shujun Ou, Ning Jiang

For LTR_retriever, please cite:

    Ou S and Jiang N (2018). LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176(2): 1410-1422.

For LAI, please cite:

    Ou S, Chen J, Jiang N (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 2018;46(21):e126.

Parameters: -genome N02.fa -inharvest N02.fa.rawLTR.scn -threads 10 -u 7.3e-9

Wed 12 Jul 2023 10:31:55 AM CST Dependency checking: All passed!

                            Previous LTR_retriever results found, backed up to LTRretriever-pre07-12-23_1032

Wed 12 Jul 2023 10:32:05 AM CST LTR_retriever is starting from the Init step. Wed 12 Jul 2023 10:32:06 AM CST Start to convert inputs... Total candidates: 21340 Total uniq candidates: 20217

Wed 12 Jul 2023 10:32:09 AM CST Module 1: Start to clean up candidates... Sequences with 10 missing bp or 0.8 missing data rate will be discarded. Sequences containing tandem repeats will be discarded.

Wed 12 Jul 2023 10:32:10 AM CST 20114 clean candidates remained

Wed 12 Jul 2023 10:32:10 AM CST Modules 2-5: Start to analyze the structure of candidates... The terminal motif, TSD, boundary, orientation, age, and superfamily will be identified in this step.

awk: cannot open N02.fa.retriever.scn.extend.fa.rexdb.cls.tsv (No such file or directory) Wed 12 Jul 2023 10:46:01 AM CST Intact LTR-RT found: 6493

Wed 12 Jul 2023 11:10:30 AM CST Module 6: Start to analyze truncated LTR-RTs... Truncated LTR-RTs without the intact version will be retained in the LTR-RT library. Use -notrunc if you don't want to keep them.

Wed 12 Jul 2023 11:10:30 AM CST 860 truncated LTR-RTs found Wed 12 Jul 2023 11:14:02 AM CST 112 truncated LTR sequences have added to the library

Wed 12 Jul 2023 11:14:02 AM CST Module 5: Start to remove DNA TE and LINE transposases, and remove plant protein sequences... Total library sequences: 6805

##########################

LTR_retriever v2.9.5

##########################

Contributors: Shujun Ou, Ning Jiang

For LTR_retriever, please cite:

    Ou S and Jiang N (2018). LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176(2): 1410-1422.

For LAI, please cite:

    Ou S, Chen J, Jiang N (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 2018;46(21):e126.

Parameters: -genome N02.fa -inharvest N02.fa.rawLTR.scn -threads 10 -u 7.3e-9

Wed 12 Jul 2023 11:14:31 AM CST Dependency checking: All passed!

                            Previous LTR_retriever results found, backed up to LTRretriever-pre07-12-23_1114

Wed 12 Jul 2023 11:14:41 AM CST LTR_retriever is starting from the Init step. Wed 12 Jul 2023 11:14:43 AM CST Start to convert inputs... Total candidates: 21340 Total uniq candidates: 20217

Wed 12 Jul 2023 11:14:46 AM CST Module 1: Start to clean up candidates... Sequences with 10 missing bp or 0.8 missing data rate will be discarded. Sequences containing tandem repeats will be discarded.

Wed 12 Jul 2023 11:14:46 AM CST 20114 clean candidates remained

Wed 12 Jul 2023 11:14:46 AM CST Modules 2-5: Start to analyze the structure of candidates... The terminal motif, TSD, boundary, orientation, age, and superfamily will be identified in this step.

Command line argument error: Argument "query". File is not accessible: N02.fa.ltrTE.stg3.cln' cat: N02.fa.ltrTE.stg3.line.out: No such file or directory cat: N02.fa.ltrTE.stg3.dna.out: No such file or directory cp: cannot stat 'N02.fa.ltrTE.stg3.cln': No such file or directory Command line argument error: Argument "query". File is not accessible:N02.fa.ltrTE.stg3.cln.clean' ERROR: Please specify the BLAST result!

Clean up sequence using blast resuls

perl purger.pl -blast blast_outfmt6 -seq seq.fa [options]

Options: -eval [0-1] e-value cutoff; discard the hit if >= this number; default 0.001 -len [int] length cutoff; discard the hit if < this number; default 90 (bp) -iden [0-100] identity cutoff; discard the hit if <= tis number; default 35 (%) -cov [0-1] coverage cutoff; discard the entire sequence if >= this number; default 1 -purge [0|1] purge switch; switch on=1(default)/off=0 to clean up aligned region and joint unaligned sequences

Dependency: combine_overlap.pl, call_seq_by_list.pl

BLAST example: blastn -subject seq.fa -query removal.lib.fa -outfmt=6 > blast_outfmt6

Shujun Ou (oushujun@msu.edu) 04/17/2017

ERROR: No such file or directory at /home/renmiaozhen/software/LTR_retriever-2.9.5/bin/output_by_list.pl line 37. ERROR: No such file or directory at /home/renmiaozhen/software/LTR_retriever-2.9.5/bin/output_by_list.pl line 37. Wed 12 Jul 2023 11:19:12 AM CST Retained clean sequence: 0

ERROR: 6493 intact LTR-RTs have found, but the pre-library file N02.fa.ltrTE is empty. Something is wrong at this point. Please report the bug to https://github.com/oushujun/LTR_retriever/issues Program halt! awk: cannot open N02.fa.retriever.scn.extend.fa.rexdb.cls.tsv (No such file or directory) Wed 12 Jul 2023 11:28:49 AM CST Intact LTR-RT found: 6496

Wed 12 Jul 2023 11:53:30 AM CST Module 6: Start to analyze truncated LTR-RTs... Truncated LTR-RTs without the intact version will be retained in the LTR-RT library. Use -notrunc if you don't want to keep them.

Wed 12 Jul 2023 11:53:30 AM CST 861 truncated LTR-RTs found Wed 12 Jul 2023 02:47:06 PM CST 113 truncated LTR sequences have added to the library

Wed 12 Jul 2023 02:47:06 PM CST Module 5: Start to remove DNA TE and LINE transposases, and remove plant protein sequences... Total library sequences: 6805 Wed 12 Jul 2023 03:44:10 PM CST Retained clean sequence: 6802

Wed 12 Jul 2023 03:44:10 PM CST Sequence clustering for N02.fa.ltrTE ... Wed 12 Jul 2023 03:44:10 PM CST Unique lib sequence: 6785

Wed 12 Jul 2023 03:55:23 PM CST Module 6: Start to remove nested insertions in internal regions... Wed 12 Jul 2023 04:32:52 PM CST Raw internal region size (bit): 27554393 Clean internal region size (bit): 22913514

Wed 12 Jul 2023 04:32:53 PM CST Sequence number of the redundant LTR-RT library: 19594 The redundant LTR-RT library size (bit): 54683474

Wed 12 Jul 2023 04:32:53 PM CST Module 8: Start to make non-redundant library...

Wed 12 Jul 2023 04:41:04 PM CST Final LTR-RT library entries: 5663 Final LTR-RT library size (bit): 23399535

Wed 12 Jul 2023 04:41:04 PM CST Total intact LTR-RTs found: 6495 Total intact non-TGCA LTR-RTs found: 36

Wed 12 Jul 2023 04:41:07 PM CST Start to annotate whole-genome LTR-RTs... Use -noanno if you don't want whole-genome LTR-RT annotation.

######################################

LTR Assembly Index (LAI) beta3.2

######################################

Developer: Shujun Ou

Please cite:

Ou S., Chen J. and Jiang N. (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. gky730: https://doi.org/10.1093/nar/gky730

Parameters: -genome N02.fa -intact N02.fa.pass.list -all N02.fa.out -t 10 -q -blast /home/renmiaozhen/miniconda3/bin/

Wed 12 Jul 2023 11:24:17 PM CST Dependency checking: Passed! Wed 12 Jul 2023 11:24:17 PM CST Calculation of LAI will be based on the whole genome. Please use the -mono parameter if your genome is a recent ployploid, otherwise high identity between LTR homeologues will overcorrect raw LAI scores and result in low LAI. Wed 12 Jul 2023 11:24:17 PM CST Estimate the identity of LTR sequences in the genome: quick mode Wed 12 Jul 2023 11:46:40 PM CST The identity of LTR sequences: 91.2381795450366%

                            【Warning】 The identity drops below the safe limit. Instead, identity of 92% will be used for LAI adjustment.

Wed 12 Jul 2023 11:46:40 PM CST Calculate LAI:

                                            Done!

Wed 12 Jul 2023 11:47:08 PM CST Result file: N02.fa.out.LAI

                            You may use either raw_LAI or LAI for intraspecific comparison
                            but please use ONLY LAI for interspecific comparison

Wed 12 Jul 2023 11:47:09 PM CST All analyses were finished!

############################## ####### Result files ######### ##############################

Table output for intact LTR-RTs (detailed info) N02.fa.pass.list (All LTR-RTs) N02.fa.nmtf.pass.list (Non-TGCA LTR-RTs) N02.fa.pass.list.gff3 (GFF3 format for intact LTR-RTs)

LTR-RT library N02.fa.LTRlib.redundant.fa (All LTR-RTs with redundancy) N02.fa.LTRlib.fa (All non-redundant LTR-RTs) N02.fa.nmtf.LTRlib.fa (Non-TGCA LTR-RTs)

Whole-genome LTR-RT annotation by the non-redundant library N02.fa.LTR.gff3 (GFF3 format) N02.fa.out.fam.size.list (LTR family summary) N02.fa.out.superfam.size.list (LTR superfamily summary)

LTR Assembly Index (LAI) N02.fa.out.LAI

oushujun commented 11 months ago

Hello,

Can you update to cf58800 and try again?

I don't understand why you ran LTR_retriever twice.

Thanks, Shujun

oushujun commented 10 months ago

@renmiaozhen any luck?

oushujun commented 6 months ago

This bug (#154) should be fixed in the latest version. Please update and try again. Please reopen the issue if you find it not fixed. Thank you!

Shujun