oushujun / LTR_retriever

LTR_retriever is a highly accurate and sensitive program for identification of LTR retrotransposons; The LTR Assembly Index (LAI) is also included in this package.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5813529/
GNU General Public License v3.0
179 stars 40 forks source link

no LAI report #21

Closed xinshuaiqi closed 5 years ago

xinshuaiqi commented 5 years ago

Hi Shujun, I have a few runs with normal LTR-retriever.scn outputs, but no LAI report. I noticed there are NO 'out' file from RepeatMasker in the folder. No error message was captured on the std error. Any idea what's the problem?

Below are two examples:

-rw-r--r--  1 root  root                          46 Sep 17 20:04 genomeA__ver100.applied_reference_genome.genome.fasta.des
-rw-r--r--  1 root  root                        492M Sep 17 20:04 genomeA__ver100.applied_reference_genome.genome.fasta.esq
-rw-r--r--  1 root  root                         38M Sep 19 22:56 genomeA__ver100.applied_reference_genome.genome.fasta.finder.scn
-rw-r--r--  1 root  root                        4.4M Sep 17 21:19 genomeA__ver100.applied_reference_genome.genome.fasta.harvest.scn
-rw-r--r--  1 root  root                        2.0G Sep 17 20:43 genomeA__ver100.applied_reference_genome.genome.fasta.lcp
-rw-r--r--  1 root  root                        2.3G Sep 17 20:43 genomeA__ver100.applied_reference_genome.genome.fasta.llv
-rw-r--r--  1 root  root                        450M Sep 19 01:19 genomeA__ver100.applied_reference_genome.genome.fasta.ltrTE.fa
-rw-r--r--  1 root  root                        208K Sep 19 01:36 genomeA__ver100.applied_reference_genome.genome.fasta.ltrTE.fa.cleanup
-rw-r--r--  1 root  root                        415M Sep 19 01:36 genomeA__ver100.applied_reference_genome.genome.fasta.ltrTE.stg1
-rw-r--r--  1 root  root                         363 Sep 17 20:04 genomeA__ver100.applied_reference_genome.genome.fasta.md5
-rw-r--r--  1 root  root                         503 Sep 17 20:43 genomeA__ver100.applied_reference_genome.genome.fasta.prj
-rw-r--r--  1 root  root                        4.4M Sep 17 21:20 genomeA__ver100.applied_reference_genome.genome.fasta.retriever.scn
-rw-r--r--  1 root  root                        2.1M Sep 19 01:36 genomeA__ver100.applied_reference_genome.genome.fasta.retriever.scn.extend
-rw-r--r--  1 root  root                        345M Sep 20 00:21 genomeA__ver100.applied_reference_genome.genome.fasta.retriever.scn.extend.fa
-rw-r--r--  1 root  root                        2.2M Sep 17 21:20 genomeA__ver100.applied_reference_genome.genome.fasta.retriever.scn.full
-rw-r--r--  1 root  root                        4.5M Sep 17 21:20 genomeA__ver100.applied_reference_genome.genome.fasta.retriever.scn.list
-rw-r--r--  1 root  root                          80 Sep 17 20:04 genomeA__ver100.applied_reference_genome.genome.fasta.sds
-rw-r--r--  1 root  root                          48 Sep 17 20:04 genomeA__ver100.applied_reference_genome.genome.fasta.ssp
-rw-r--r--  1 root  root                         16G Sep 17 20:43 genomeA__ver100.applied_reference_genome.genome.fasta.suf
drwxr-xr-x  2 evrpa RstudioUsersGenomeAnalytics 6.0K Sep 17 17:01 1/
-rw-r--r--  1 root  root                         44M Sep 17 21:19 alluniRefprexp082813.15942
-rw-r--r--  1 root  root                         15M Sep 17 21:20 alluniRefprexp082813.15942.phr
-rw-r--r--  1 root  root                        801K Sep 17 21:20 alluniRefprexp082813.15942.pin
-rw-r--r--  1 root  root                         36M Sep 17 21:20 alluniRefprexp082813.15942.psq
-rw-r--r--  1 root  root                           0 Sep 17 19:34 std.err.ltr_finder
-rw-r--r--  1 root  root                           0 Sep 17 20:43 stderr.ltr_harvest
-rw-r--r--  1 root  root                           0 Sep 17 21:19 std.err.ltr_retriever
-rw-r--r--  1 root  root                        1.6M Sep 17 21:19 Tpases020812DNA.15942
-rw-r--r--  1 root  root                        340K Sep 17 21:19 Tpases020812DNA.15942.phr
-rw-r--r--  1 root  root                         19K Sep 17 21:19 Tpases020812DNA.15942.pin
-rw-r--r--  1 root  root                        1.4M Sep 17 21:19 Tpases020812DNA.15942.psq
-rw-r--r--  1 root  root                        2.0M Sep 17 21:19 Tpases020812LINE.15942
-rw-r--r--  1 root  root                        306K Sep 17 21:19 Tpases020812LINE.15942.phr
-rw-r--r--  1 root  root                         19K Sep 17 21:19 Tpases020812LINE.15942.pin
-rw-r--r--  1 root  root                        1.8M Sep 17 21:19 Tpases020812LINE.15942.psq
-rw-r--r--  1 root  root                         44M Sep 18 05:28 alluniRefprexp082813.876134
-rw-r--r--  1 root  root                         15M Sep 18 05:28 alluniRefprexp082813.876134.phr
-rw-r--r--  1 root  root                        801K Sep 18 05:28 alluniRefprexp082813.876134.pin
-rw-r--r--  1 root  root                         36M Sep 18 05:28 alluniRefprexp082813.876134.psq
-rw-r--r--  1 root  root                           0 Sep 17 19:30 std.err.ltr_finder
-rw-r--r--  1 root  root                           0 Sep 18 04:58 stderr.ltr_harvest
-rw-r--r--  1 root  root                           0 Sep 18 05:28 std.err.ltr_retriever
-rw-r--r--  1 root  root                        1.6M Sep 18 05:28 Tpases020812DNA.876134
-rw-r--r--  1 root  root                        340K Sep 18 05:28 Tpases020812DNA.876134.phr
-rw-r--r--  1 root  root                         19K Sep 18 05:28 Tpases020812DNA.876134.pin
-rw-r--r--  1 root  root                        1.4M Sep 18 05:28 Tpases020812DNA.876134.psq
-rw-r--r--  1 root  root                        2.0M Sep 18 05:28 Tpases020812LINE.876134
-rw-r--r--  1 root  root                        306K Sep 18 05:28 Tpases020812LINE.876134.phr
-rw-r--r--  1 root  root                         19K Sep 18 05:28 Tpases020812LINE.876134.pin
-rw-r--r--  1 root  root                        1.8M Sep 18 05:28 Tpases020812LINE.876134.psq
lrwxrwxrwx  1 evrpa RstudioUsersGenomeAnalytics   77 Sep 11 14:59 Zea_mays.AGPv4.dna_sm.toplevel.fa 
-rw-r--r--  1 root  root                         18K Sep 18 04:20 Zea_mays.AGPv4.dna_sm.toplevel.fa.des
-rw-r--r--  1 root  root                        509M Sep 18 04:21 Zea_mays.AGPv4.dna_sm.toplevel.fa.esq
-rw-r--r--  1 root  root                         29M Sep 19 15:23 Zea_mays.AGPv4.dna_sm.toplevel.fa.finder.scn
-rw-r--r--  1 root  root                        4.6M Sep 18 05:28 Zea_mays.AGPv4.dna_sm.toplevel.fa.harvest.scn
-rw-r--r--  1 root  root                        2.0G Sep 18 04:58 Zea_mays.AGPv4.dna_sm.toplevel.fa.lcp
-rw-r--r--  1 root  root                        2.0G Sep 18 04:58 Zea_mays.AGPv4.dna_sm.toplevel.fa.llv
-rw-r--r--  1 root  root                        8.6K Sep 18 04:20 Zea_mays.AGPv4.dna_sm.toplevel.fa.md5
-rw-r--r--  1 root  root                        2.1G Sep 18 05:29 Zea_mays.AGPv4.dna_sm.toplevel.fa.mod
-rw-r--r--  1 root  root                        477M Sep 19 11:57 Zea_mays.AGPv4.dna_sm.toplevel.fa.mod.ltrTE.fa
-rw-r--r--  1 root  root                        146K Sep 19 12:15 Zea_mays.AGPv4.dna_sm.toplevel.fa.mod.ltrTE.fa.cleanup
-rw-r--r--  1 root  root                        451M Sep 19 12:15 Zea_mays.AGPv4.dna_sm.toplevel.fa.mod.ltrTE.stg1
-rw-r--r--  1 root  root                        4.6M Sep 18 05:29 Zea_mays.AGPv4.dna_sm.toplevel.fa.mod.retriever.scn
-rw-r--r--  1 root  root                        2.2M Sep 19 12:15 Zea_mays.AGPv4.dna_sm.toplevel.fa.mod.retriever.scn.extend
-rw-r--r--  1 root  root                        154M Sep 20 00:21 Zea_mays.AGPv4.dna_sm.toplevel.fa.mod.retriever.scn.extend.fa
-rw-r--r--  1 root  root                        2.3M Sep 18 05:29 Zea_mays.AGPv4.dna_sm.toplevel.fa.mod.retriever.scn.full
-rw-r--r--  1 root  root                        4.7M Sep 18 05:29 Zea_mays.AGPv4.dna_sm.toplevel.fa.mod.retriever.scn.list
-rw-r--r--  1 root  root                         502 Sep 18 04:58 Zea_mays.AGPv4.dna_sm.toplevel.fa.prj
-rw-r--r--  1 root  root                        2.1K Sep 18 04:20 Zea_mays.AGPv4.dna_sm.toplevel.fa.sds
-rw-r--r--  1 root  root                        1.1K Sep 18 04:21 Zea_mays.AGPv4.dna_sm.toplevel.fa.ssp
-rw-r--r--  1 root  root                         16G Sep 18 04:58 Zea_mays.AGPv4.dna_sm.toplevel.fa.suf
oushujun commented 5 years ago

Hi Xinshuai,

Looks like LTR_retriever was stopped somehow, and the run was not completed. Please rerun it. You may check STDOUT for program status.

Thanks, Shujun

xinshuaiqi commented 5 years ago

Hi Shujun, I have a few LAI jobs always crash. It's hard to track the stdout because I am using AWS cloud. As you may see, my stderr file captured nothing...

I suspect some of the following reasons:

Is it common when you apply LTR-retriever and LAI to many genomes, that many of them will crash?

In cases, when I try to only re-run 'LAI' after 'LTRretriever' fail, I don't see RepeatMasker annotation 'genome.fa.out' in the folder. Is that wired?

Would you mind the explain a little about those Tpases and alluniRefprexp files? I suppose they are temporary process files handling each LTR case. Does that mean in certain specific case, the job encountered some unexpected error, then failed?

In general, I spend 20% time to get 80% jobs done, then have been spending the rest 80% time on debugging the rest 20%...

Your comments are helpful for my debugging. Thanks a lot.

oushujun commented 5 years ago

Hi Xinshuai,

This is not common. You may want to check the limit of resource allocation in your AWS nodes. To capture screen output, you can use nohup, eg. nohup LTR_retriever xxxx > nohup.genome.out, then you should be able to see what stage the program is stopped. From the files you listed above, the program stopped in stage 2 - identify coding sequences. So you may want to check if hmmsearch worked properly.

LTR_retriever will try a couple attempts to fix the long file name issue, and will quit and let you know (stdout) if it fails to do so. This seems to be not your case.

In some cases, LTR_retriever do choke and get stuck on the structural analyses, this usually happens when you specify more CPUs than the program can get, or you are running multiple programs that eat up all CPUs, such that LTR_retriever cannot open new threads because no CPUs are available. In this case, you have to kill and rerun it and make sure you specify the right -t. In practice, -t 5 works pretty well for the first phase - identification of intact LTR-RT and make a library. But for the second phase, whole-genome LTR annotation, you may want more CPUs for a large genome. You can do this in two steps.

When LTR_retriever fails, of course you won't be able to run LAI because the dependent files are not generated yet. If you have .LTRlib.fa, you can use it with RepeatMasker to get the whole genome LTR annotation (.out), then run LAI separately.

Tpases and alluniRefprexp are database files for annotation of LTR-RT. They will be deleted after the program finished. Seeing those means your run was not complete.

Thanks, Shujun

yuzhenpeng commented 2 years ago

hello, I met the same problem. Could you please give me some help. I think my genomic data is not fit for using LAI evaluation, right?

`LAI -genome guatoujing.contig.purged.fa -intact guatoujing.contig.purged.fa.pass.list -all guatoujing.contig.purged.fa.out -q -t 10

######################################

LTR Assembly Index (LAI) beta3.2

######################################

Developer: Shujun Ou

Please cite:

Ou S., Chen J. and Jiang N. (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. gky730: https://doi.org/10.1093/nar/gky730

Parameters: -genome guatoujing.contig.purged.fa -intact guatoujing.contig.purged.fa.pass.list -all guatoujing.contig.purged.fa.out -q -t 10

Mon Nov 8 10:31:08 EST 2021 Dependency checking: Passed! Mon Nov 8 10:31:08 EST 2021 Calculation of LAI will be based on the whole genome. Please use the -mono parameter if your genome is a recent ployploid, for high identity between homeologues will overcorrect raw LAI scores. Mon Nov 8 10:31:08 EST 2021 Estimate the identity of LTR sequences in the genome: quick mode Mon Nov 8 11:21:33 EST 2021 The identity of LTR sequences: 93.1875177095501% Mon Nov 8 11:21:33 EST 2021 Calculate LAI: 【Error】Intact LTR-RT content (0.09%) is too low for accurate LAI calculation (min 0.1% required) Sorry, LAI is not applicable on the current genome assembly. `

oushujun commented 2 years ago

@yuzhenpeng Do you expect lots of LTRs in your genome? If the annotation seems correct, then LAI may not be applicable to your genome. - Shujun

yuzhenpeng commented 2 years ago

@yuzhenpeng Do you expect lots of LTRs in your genome? If the annotation seems correct, then LAI may not be applicable to your genome. - Shujun

There are some related species have low LTR in their genome,about 4.0~5.5%. Thus, LAI seems not to fit my genome.