oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
330 stars 72 forks source link

EDTA failing after LTR candidate identification #188

Closed reslp closed 3 years ago

reslp commented 3 years ago

Hi Shujun,

First thank you very much for EDTA!

I have been using EDTA to identify TEs in a set of fungal genomes. I am using a self-made singularity container for this because I am running this on a large cluster. For most genomes annotations works well, however I do get a strange error for some genomes which I can't seem to figure out and solve. I hope you will be able to give some suggestions as to what might be going wrong here.

Thank you very much for your help!

all the best,

Philipp

Here is the output from one of the failed runs:

########################################################
##### Extensive de-novo TE Annotator (EDTA) v1.9.6  ####
##### Shujun Ou (shujun.ou.1@gmail.com)             ####
########################################################

Mon Feb  1 21:41:50 CET 2021    Dependency checking:
                All passed!

Mon Feb  1 21:42:05 CET 2021    The longest sequence ID in the genome contains 37 characters, which is longer than the limit (15)
                Trying to reformat seq IDs...
                Attempt 1...
Mon Feb  1 21:42:05 CET 2021    Seq ID conversion successful!

Mon Feb  1 21:42:05 CET 2021    Obtain raw TE libraries using various structure-based programs:
Mon Feb  1 21:42:05 CET 2021    EDTA_raw: Check dependencies, prepare working directories.

Mon Feb  1 21:42:19 CET 2021    Start to find LTR candidates.

Mon Feb  1 21:42:19 CET 2021    Identify LTR retrotransposon candidates from scratch.

awk: cannot open nuclear_Xylographa_carneopallida_metagenomeV1_blobtools_Ascomycota.fa.mod.pass.list (No such file or directory)
Warning: LOC list - is empty.

Error: Error while loading sequence
    perl filter_gff3.pl file.gff3 file.list > new.gff3

cp: cannot stat 'nuclear_Xylographa_carneopallida_metagenomeV1_blobtools_Ascomycota.fa.mod.LTRlib.fa': No such file or directory
cp: cannot stat 'nuclear_Xylographa_carneopallida_metagenomeV1_blobtools_Ascomycota.fa.mod.LTRlib.fa': No such file or directory
Error: LTR results not found!

cat: nuclear_Xylographa_carneopallida_metagenomeV1_blobtools_Ascomycota.fa.mod.TIR.intact.fa: No such file or directory
cat: nuclear_Xylographa_carneopallida_metagenomeV1_blobtools_Ascomycota.fa.mod.Helitron.intact.fa: No such file or directory
cat: nuclear_Xylographa_carneopallida_metagenomeV1_blobtools_Ascomycota.fa.mod.TIR.intact.bed: No such file or directory
cat: nuclear_Xylographa_carneopallida_metagenomeV1_blobtools_Ascomycota.fa.mod.Helitron.intact.bed: No such file or directory
Mon Feb  1 21:50:51 CET 2021    Obtain raw TE libraries finished.
                All intact TEs found by EDTA:
                    nuclear_Xylographa_carneopallida_metagenomeV1_blobtools_Ascomycota.fa.mod.EDTA.intact.fa
                    nuclear_Xylographa_carneopallida_metagenomeV1_blobtools_Ascomycota.fa.mod.EDTA.intact.gff3

Mon Feb  1 21:50:51 CET 2021    Perform EDTA advcance filtering for raw TE candidates and generate the stage 1 library:

Mon Feb  1 22:03:13 CET 2021    EDTA advcance filtering finished.

Mon Feb  1 22:03:13 CET 2021    Perform EDTA final steps to generate a non-redundant comprehensive TE library:

                Skipping the RepeatModeler step (--sensitive 0).
                Run EDTA.pl --step final --sensitive 1 if you want to use RepeatModeler.

                Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

ERROR: Intact TE annotation not found in nuclear_Xylographa_carneopallida_metagenomeV1_blobtools_Ascomycota.fa.mod.EDTA.intact.gff3 at /opt/conda/share/EDTA/EDTA.pl line 566.
oushujun commented 3 years ago

Hi Philipp,

If you encounter any errors in just a subset of your genomes, and you also see warnings like this: The longest sequence ID in the genome contains 37 characters, which is longer than the limit (15) Trying to reformat seq IDs... Attempt 1..., that means you probably need to simplify your sequence names beforehand. It's always a good practice to simplify your sequence IDs before doing anything. See more: #182

There is a singularity version for EDTA too but slightly outdated: https://hub.docker.com/r/oushujun/edta/tags?page=1&ordering=last_updated You are welcome to share your image here which will benefit others.

Best, Shujun

reslp commented 3 years ago

Hi Shujun,

Thank you for you quick response and your suggestions. I shortened the sequence names, although this did not fix the issue (see output of EDTA below).

Yes, I saw your singularity container. I made my own because I wanted to use the latest version. Of course I am happy to share my image. It is based on a Docker file which I convert to singularity. The Dockerfile is here.

The container can be pulled like so:

$ docker pull reslp/edta:1.9.6

or singularity:

$ singularity pull docker://reslp/edta:1.9.6

The edta command I am using is:

EDTA.pl --genome results/Xylographa_carneopallida/Xylographa_carneopallida.assembly.fa --overwrite 1 --anno 1 --force 1  --threads 12 &> logs/Xylographa_carneopallida_edta.log

all the best,

Philipp

########################################################
##### Extensive de-novo TE Annotator (EDTA) v1.9.6  ####
##### Shujun Ou (shujun.ou.1@gmail.com)             ####
########################################################

Thu Apr 22 14:11:04 CEST 2021   Dependency checking:
                All passed!

Thu Apr 22 14:11:16 CEST 2021   Obtain raw TE libraries using various structure-based programs: 
Thu Apr 22 14:11:16 CEST 2021   EDTA_raw: Check dependencies, prepare working directories.

Thu Apr 22 14:11:26 CEST 2021   Start to find LTR candidates.

Thu Apr 22 14:11:26 CEST 2021   Identify LTR retrotransposon candidates from scratch.

awk: cannot open Xylographa_carneopallida.assembly.fa.mod.pass.list (No such file or directory)
Warning: LOC list - is empty.

Error: Error while loading sequence
    perl filter_gff3.pl file.gff3 file.list > new.gff3

cp: cannot stat 'Xylographa_carneopallida.assembly.fa.mod.LTRlib.fa': No such file or directory
cp: cannot stat 'Xylographa_carneopallida.assembly.fa.mod.LTRlib.fa': No such file or directory
Error: LTR results not found!

cat: Xylographa_carneopallida.assembly.fa.mod.TIR.intact.fa: No such file or directory
cat: Xylographa_carneopallida.assembly.fa.mod.Helitron.intact.fa: No such file or directory
cat: Xylographa_carneopallida.assembly.fa.mod.TIR.intact.bed: No such file or directory
cat: Xylographa_carneopallida.assembly.fa.mod.Helitron.intact.bed: No such file or directory
Thu Apr 22 14:22:22 CEST 2021   Obtain raw TE libraries finished.
                All intact TEs found by EDTA: 
                    Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa
                    Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3

Thu Apr 22 14:22:22 CEST 2021   Perform EDTA advcance filtering for raw TE candidates and generate the stage 1 library: 

Thu Apr 22 14:32:58 CEST 2021   EDTA advcance filtering finished.

Thu Apr 22 14:32:58 CEST 2021   Perform EDTA final steps to generate a non-redundant comprehensive TE library:

                Skipping the RepeatModeler step (--sensitive 0).
                Run EDTA.pl --step final --sensitive 1 if you want to use RepeatModeler.

                Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

ERROR: Intact TE annotation not found in Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3 at /opt/conda/share/EDTA/EDTA.pl line 566.
oushujun commented 3 years ago

Hi Philipp,

Thanks for sharing the docker image.

If sequence names are not an issue, please check the raw/LTR/ folder and see which step started to go wrong. You may paste your directory info here.

Best, Shujun

On Thu, Apr 22, 2021 at 10:08 PM Philipp Resl @.***> wrote:

Hi Shujun,

Thank you for you quick response and your suggestions. I shortened the sequence names, although this did not fix the issue (see output of EDTA below).

Yes, I saw your singularity container. I made my own because I wanted to use the latest version. Of course I am happy to share my image. It is based on a Docker file which I convert to singularity. The Dockerfile is here https://github.com/reslp/dockerfiles/blob/master/edta/Dockerfile.

The container can be pulled like so:

$ docker pull reslp/edta:1.9.6

or singularity:

$ singularity pull docker://reslp/edta:1.9.6

The edta command I am using is:

EDTA.pl --genome results/Xylographa_carneopallida/Xylographa_carneopallida.assembly.fa --overwrite 1 --anno 1 --force 1 --threads 12 &> logs/Xylographa_carneopallida_edta.log

all the best,

Philipp

########################################################

Extensive de-novo TE Annotator (EDTA) v1.9.6
Shujun Ou @.***)

########################################################

Thu Apr 22 14:11:04 CEST 2021 Dependency checking: All passed!

Thu Apr 22 14:11:16 CEST 2021 Obtain raw TE libraries using various structure-based programs: Thu Apr 22 14:11:16 CEST 2021 EDTA_raw: Check dependencies, prepare working directories.

Thu Apr 22 14:11:26 CEST 2021 Start to find LTR candidates.

Thu Apr 22 14:11:26 CEST 2021 Identify LTR retrotransposon candidates from scratch.

awk: cannot open Xylographa_carneopallida.assembly.fa.mod.pass.list (No such file or directory) Warning: LOC list - is empty.

Error: Error while loading sequence perl filter_gff3.pl file.gff3 file.list > new.gff3

cp: cannot stat 'Xylographa_carneopallida.assembly.fa.mod.LTRlib.fa': No such file or directory cp: cannot stat 'Xylographa_carneopallida.assembly.fa.mod.LTRlib.fa': No such file or directory Error: LTR results not found!

cat: Xylographa_carneopallida.assembly.fa.mod.TIR.intact.fa: No such file or directory cat: Xylographa_carneopallida.assembly.fa.mod.Helitron.intact.fa: No such file or directory cat: Xylographa_carneopallida.assembly.fa.mod.TIR.intact.bed: No such file or directory cat: Xylographa_carneopallida.assembly.fa.mod.Helitron.intact.bed: No such file or directory Thu Apr 22 14:22:22 CEST 2021 Obtain raw TE libraries finished. All intact TEs found by EDTA: Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3

Thu Apr 22 14:22:22 CEST 2021 Perform EDTA advcance filtering for raw TE candidates and generate the stage 1 library:

Thu Apr 22 14:32:58 CEST 2021 EDTA advcance filtering finished.

Thu Apr 22 14:32:58 CEST 2021 Perform EDTA final steps to generate a non-redundant comprehensive TE library:

          Skipping the RepeatModeler step (--sensitive 0).
          Run EDTA.pl --step final --sensitive 1 if you want to use RepeatModeler.

          Skipping the CDS cleaning step (--cds [File]) since no CDS file is provided or it's empty.

ERROR: Intact TE annotation not found in Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3 at /opt/conda/share/EDTA/EDTA.pl line 566.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/188#issuecomment-824874667, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NAD4MKTGRYK335A2ZLTKAUXPANCNFSM43K52C6Q .

reslp commented 3 years ago

Hi Shujun,

I just reran the analysis for this species again. This is what I get after it fails with the some output as posted above. This is the listing from inside the output directory. Inside the raw directory the Helitron and TIR folder seem to be empty. Maybe the problem lies somewhere there? Is there anything I can to to diagnose further?

all the best and many thanks again,

Philipp

.
├── Xylographa_carneopallida.assembly.fa -> results/Xylographa_carneopallida/Xylographa_carneopallida.assembly.fa
├── Xylographa_carneopallida.assembly.fa.mod
├── Xylographa_carneopallida.assembly.fa.mod.EDTA.combine
│   └── Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.fa.stg1
├── Xylographa_carneopallida.assembly.fa.mod.EDTA.final
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa.raw
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.cln
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter0
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter1
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter2
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.stat
│   ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.TElib.fa
│   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.fa.stg1
│   └── Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.others.fa.stg2.clean
├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3
└── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw
    ├── Helitron
    ├── LTR
    │   ├── Xylographa_carneopallida.assembly.fa.mod.defalse
    │   ├── Xylographa_carneopallida.assembly.fa.mod.finder.combine.gff3
    │   ├── Xylographa_carneopallida.assembly.fa.mod.finder.combine.scn
    │   ├── Xylographa_carneopallida.assembly.fa.mod.harvest.combine.gff3
    │   ├── Xylographa_carneopallida.assembly.fa.mod.harvest.combine.scn
    │   ├── Xylographa_carneopallida.assembly.fa.mod.list
    │   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa
    │   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori
    │   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.dusted
    │   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.dusted.cleanup
    │   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.rmlist
    │   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.gff3
    │   ├── Xylographa_carneopallida.assembly.fa.mod.nmtf.pass.list
    │   ├── Xylographa_carneopallida.assembly.fa.mod.rawLTR.scn
    │   └── Xylographa_carneopallida.assembly.fa.mod.retriever.all.scn
    ├── TIR
    ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa
    ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3
    ├── Xylographa_carneopallida.assembly.fa.mod.Helitron.raw.fa
    ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa
    ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.gff3
    ├── Xylographa_carneopallida.assembly.fa.mod.LTR.raw.fa
    └── Xylographa_carneopallida.assembly.fa.mod.TIR.raw.fa

6 directories, 39 files
oushujun commented 3 years ago

Can You also list their sizes, thanks! -Shujun

On Fri, Apr 23, 2021 at 11:08 PM Philipp Resl @.***> wrote:

Hi Shujun,

I just reran the analysis for this species again. This is what I get after it fails with the some output as posted above. This is the listing from inside the output directory. Inside the raw directory the Helitron and TIR folder seem to be empty. Maybe the problem lies somewhere there? Is there anything I can to to diagnose further?

all the best and many thanks again,

Philipp

.

├── Xylographa_carneopallida.assembly.fa -> results/Xylographa_carneopallida/Xylographa_carneopallida.assembly.fa

├── Xylographa_carneopallida.assembly.fa.mod

├── Xylographa_carneopallida.assembly.fa.mod.EDTA.combine

│ └── Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.fa.stg1

├── Xylographa_carneopallida.assembly.fa.mod.EDTA.final

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa.raw

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.cln

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter0

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter1

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter2

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.stat

│ ├── Xylographa_carneopallida.assembly.fa.mod.EDTA.TElib.fa

│ ├── Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.fa.stg1

│ └── Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.others.fa.stg2.clean

├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3

└── Xylographa_carneopallida.assembly.fa.mod.EDTA.raw

├── Helitron

├── LTR

│   ├── Xylographa_carneopallida.assembly.fa.mod.defalse

│   ├── Xylographa_carneopallida.assembly.fa.mod.finder.combine.gff3

│   ├── Xylographa_carneopallida.assembly.fa.mod.finder.combine.scn

│   ├── Xylographa_carneopallida.assembly.fa.mod.harvest.combine.gff3

│   ├── Xylographa_carneopallida.assembly.fa.mod.harvest.combine.scn

│   ├── Xylographa_carneopallida.assembly.fa.mod.list

│   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa

│   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori

│   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.dusted

│   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.dusted.cleanup

│   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.rmlist

│   ├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.gff3

│   ├── Xylographa_carneopallida.assembly.fa.mod.nmtf.pass.list

│   ├── Xylographa_carneopallida.assembly.fa.mod.rawLTR.scn

│   └── Xylographa_carneopallida.assembly.fa.mod.retriever.all.scn

├── TIR

├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa

├── Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3

├── Xylographa_carneopallida.assembly.fa.mod.Helitron.raw.fa

├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa

├── Xylographa_carneopallida.assembly.fa.mod.LTR.intact.gff3

├── Xylographa_carneopallida.assembly.fa.mod.LTR.raw.fa

└── Xylographa_carneopallida.assembly.fa.mod.TIR.raw.fa

6 directories, 39 files

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/188#issuecomment-825723575, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NHU4PAOICFG2TEXPK3TKGEONANCNFSM43K52C6Q .

reslp commented 3 years ago

Sure, here it is. Many thanks, Philipp

.
├── [  69]  Xylographa_carneopallida.assembly.fa -> results/Xylographa_carneopallida/Xylographa_carneopallida.assembly.fa
├── [ 28M]  Xylographa_carneopallida.assembly.fa.mod
├── [   3]  Xylographa_carneopallida.assembly.fa.mod.EDTA.combine
│   └── [4.5M]  Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.fa.stg1
├── [  15]  Xylographa_carneopallida.assembly.fa.mod.EDTA.final
│   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa
│   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa.raw
│   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3
│   ├── [4.5M]  Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa
│   ├── [4.5M]  Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln
│   ├── [4.3M]  Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.cln
│   ├── [4.4M]  Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter0
│   ├── [4.3M]  Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter1
│   ├── [4.3M]  Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter2
│   ├── [4.6K]  Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.stat
│   ├── [4.3M]  Xylographa_carneopallida.assembly.fa.mod.EDTA.TElib.fa
│   ├── [4.5M]  Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.fa.stg1
│   └── [4.5M]  Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.others.fa.stg2.clean
├── [   0]  Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3
└── [  12]  Xylographa_carneopallida.assembly.fa.mod.EDTA.raw
    ├── [   2]  Helitron
    ├── [  17]  LTR
    │   ├── [2.1K]  Xylographa_carneopallida.assembly.fa.mod.defalse
    │   ├── [217K]  Xylographa_carneopallida.assembly.fa.mod.finder.combine.gff3
    │   ├── [ 696]  Xylographa_carneopallida.assembly.fa.mod.finder.combine.scn
    │   ├── [219K]  Xylographa_carneopallida.assembly.fa.mod.harvest.combine.gff3
    │   ├── [1.1K]  Xylographa_carneopallida.assembly.fa.mod.harvest.combine.scn
    │   ├── [ 73K]  Xylographa_carneopallida.assembly.fa.mod.list
    │   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa
    │   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori
    │   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.dusted
    │   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.dusted.cleanup
    │   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.rmlist
    │   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.gff3
    │   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.nmtf.pass.list
    │   ├── [1.8K]  Xylographa_carneopallida.assembly.fa.mod.rawLTR.scn
    │   └── [2.5K]  Xylographa_carneopallida.assembly.fa.mod.retriever.all.scn
    ├── [   2]  TIR
    ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa
    ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3
    ├── [744K]  Xylographa_carneopallida.assembly.fa.mod.Helitron.raw.fa
    ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa
    ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.gff3
    ├── [2.3M]  Xylographa_carneopallida.assembly.fa.mod.LTR.raw.fa
    └── [1.7M]  Xylographa_carneopallida.assembly.fa.mod.TIR.raw.fa

6 directories, 39 files
oushujun commented 3 years ago

You may want to check in raw/LTR/ defalse and scn files. If entries in the defalse file are all false, that means there were no LTRs found.

Shujun

On Sat, Apr 24, 2021 at 1:50 PM Philipp Resl @.***> wrote:

Sure, here it is. Many thanks, Philipp

.

├── [ 69] Xylographa_carneopallida.assembly.fa -> results/Xylographa_carneopallida/Xylographa_carneopallida.assembly.fa

├── [ 28M] Xylographa_carneopallida.assembly.fa.mod

├── [ 3] Xylographa_carneopallida.assembly.fa.mod.EDTA.combine

│ └── [4.5M] Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.fa.stg1

├── [ 15] Xylographa_carneopallida.assembly.fa.mod.EDTA.final

│ ├── [ 0] Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa

│ ├── [ 0] Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa.raw

│ ├── [ 0] Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3

│ ├── [4.5M] Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa

│ ├── [4.5M] Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln

│ ├── [4.3M] Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.cln

│ ├── [4.4M] Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter0

│ ├── [4.3M] Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter1

│ ├── [4.3M] Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.iter2

│ ├── [4.6K] Xylographa_carneopallida.assembly.fa.mod.EDTA.raw.fa.cln.stat

│ ├── [4.3M] Xylographa_carneopallida.assembly.fa.mod.EDTA.TElib.fa

│ ├── [4.5M] Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.fa.stg1

│ └── [4.5M] Xylographa_carneopallida.assembly.fa.mod.LTR.TIR.Helitron.others.fa.stg2.clean

├── [ 0] Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3

└── [ 12] Xylographa_carneopallida.assembly.fa.mod.EDTA.raw

├── [   2]  Helitron

├── [  17]  LTR

│   ├── [2.1K]  Xylographa_carneopallida.assembly.fa.mod.defalse

│   ├── [217K]  Xylographa_carneopallida.assembly.fa.mod.finder.combine.gff3

│   ├── [ 696]  Xylographa_carneopallida.assembly.fa.mod.finder.combine.scn

│   ├── [219K]  Xylographa_carneopallida.assembly.fa.mod.harvest.combine.gff3

│   ├── [1.1K]  Xylographa_carneopallida.assembly.fa.mod.harvest.combine.scn

│   ├── [ 73K]  Xylographa_carneopallida.assembly.fa.mod.list

│   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa

│   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori

│   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.dusted

│   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.dusted.cleanup

│   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa.ori.rmlist

│   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.gff3

│   ├── [   0]  Xylographa_carneopallida.assembly.fa.mod.nmtf.pass.list

│   ├── [1.8K]  Xylographa_carneopallida.assembly.fa.mod.rawLTR.scn

│   └── [2.5K]  Xylographa_carneopallida.assembly.fa.mod.retriever.all.scn

├── [   2]  TIR

├── [   0]  Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.fa

├── [   0]  Xylographa_carneopallida.assembly.fa.mod.EDTA.intact.gff3

├── [744K]  Xylographa_carneopallida.assembly.fa.mod.Helitron.raw.fa

├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.fa

├── [   0]  Xylographa_carneopallida.assembly.fa.mod.LTR.intact.gff3

├── [2.3M]  Xylographa_carneopallida.assembly.fa.mod.LTR.raw.fa

└── [1.7M]  Xylographa_carneopallida.assembly.fa.mod.TIR.raw.fa

6 directories, 39 files

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/188#issuecomment-826039879, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NEZIEW3BPBB333EW6DTKJLZ5ANCNFSM43K52C6Q .

reslp commented 3 years ago

Hi, I checked and there seem to be false entries in the *defalse file.

$ cat Xylographa_carneopallida.assembly.fa.mod.defalse
xyl_car_197:3655..4828  false   motif:TTAA  TSD:TTTG    3651..3654  4829..4832  IN:3780..4704   0.9590  ?   unknownNA   1620996
    Adjust: NO  lLTR: 125   rLTR: 124
    Alignment regions: 4, 125, 1051, 1173
    LTR coordinates: 3655, 3779, 4705, 4828
    TSD-LTR overlap: 0
    Boundary missing: 0

xyl_car_105:1544..3032  false   motif:GACC  TSD:GTAAC   1539..1543  3033..3037  IN:1655..2922   0.9910  ?   unknownNA   348599
    Adjust: 5' rLTR lLTR: 112   rLTR: 111
    Alignment regions: 1, 112, 1380, 1490
    LTR coordinates: 1544, 1654, 2923, 3032
    TSD-LTR overlap: 0
    Boundary missing: 0

xyl_car_204:11788..15059    false   motif:GGCC  TSD:GTTAC   11783..11787    15060..15064    IN:11899..14948 1.0000  ?   unknown NA  0
    Adjust: 5' rLTR lLTR: 112   rLTR: 112
    Alignment regions: 1, 112, 3162, 3273
    LTR coordinates: 11788, 11898, 14949, 15059
    TSD-LTR overlap: 0
    Boundary missing: 0

xyl_car_83:4836..8775   false   motif:GGCC  TSD:GTAAC   4831..4835  8776..8780  IN:4952..8659   0.9901  ?   unknownNA   383344
    Adjust: NO  lLTR: NA    rLTR: NA
    Alignment regions: 11, 112, 3841, 3941
    LTR coordinates: 4836, 4951, 8660, 8775
    TSD-LTR overlap: 0
    Boundary missing: 0

xyl_car_313:148..6896   truncated   motif:AGAC  TSD:TAGGA   143..147    6897..6901  IN:322..6726    0.9593  -   Gypsy   LTR 1609362
    Adjust: 5' rLTR lLTR: 175   rLTR: 171
    Alignment regions: 4, 175, 6578, 6750
    LTR coordinates: 148, 321, 6727, 6896
    TSD-LTR overlap: 0
    Boundary missing: 0

xyl_car_313:459..7041   false   motif:TGCA  TSD:TGGG    455..458    7042..7045  IN:612..6887    0.9583  -   Gypsy   LTR 1648793
    Adjust: NO  lLTR: NA    rLTR: NA
    Alignment regions: 6, 150, 6436, 6580
    LTR coordinates: 459, 611, 6888, 7041
    TSD-LTR overlap: 0
    Boundary missing: 0

xyl_car_88:121..5749    false   motif:GTCC  TSD:CCGAG   116..120    5750..5754  IN:246..5631    0.9831  ?   unknownNA   659370
    Adjust: NO  lLTR: NA    rLTR: NA
    Alignment regions: 10, 127, 5512, 5629
    LTR coordinates: 121, 245, 5632, 5749
    TSD-LTR overlap: 0
    Boundary missing: 0

xyl_car_170:294..11493  false   motif:GGCC  TSD:GTTAC   289..293    11494..11498    IN:405..11382   0.9821  ?   unknownNA   695124
    Adjust: 5' rLTR lLTR: 112   rLTR: 112
    Alignment regions: 1, 112, 11090, 11201
    LTR coordinates: 294, 404, 11383, 11493
    TSD-LTR overlap: 0
    Boundary missing: 0

The scn files look like so:

$ cat Xylographa_carneopallida.assembly.fa.mod.rawLTR.scn
#LTR_HARVEST_parallel -seq Xylographa_carneopallida.assembly.fa.mod -size 1000000 -time 300 -try1 1 -threads 12 -cut /opt/conda/share/EDTA/bin/LTR_HARVEST_parallel/bin/cut.pl
# LTR_HARVEST args= -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes
# LTR_HARVEST_parallel version=v1.1
# predictions are reported in the following way
# s(ret) e(ret) l(ret) s(lLTR) e(lLTR) l(lLTR) s(rLTR) e(rLTR) l(rLTR) sim(LTRs) seq-nr chr
# where:
# s = starting position
# e = ending position
# l = length
# ret = LTR-retrotransposon
# lLTR = left LTR
# rLTR = right LTR
# sim = similarity
# seq-nr = sequence order
4836 8776 3941 4836 4952 117 8660 8776 117 86.32 82 xyl_car_83
120 5749 5630 120 245 126 5631 5749 119 89.68 87 xyl_car_88
1543 3032 1490 1543 1660 118 2917 3032 116 88.98 104 xyl_car_105
294 11494 11201 294 410 117 11377 11494 118 88.98 169 xyl_car_170
3655 4828 1174 3655 3779 125 4705 4828 124 92.00 196 xyl_car_197
11788 15060 3273 11788 11904 117 14943 15060 118 90.68 203 xyl_car_204
147 6896 6750 147 328 182 6715 6896 182 87.91 312 xyl_car_313
#LTR_FINDER_parallel -seq Xylographa_carneopallida.assembly.fa.mod -size 1000000 -time 300 -try1 1 -harvest_out -threads 12 -cut /opt/conda/share/EDTA/bin/LTR_FINDER_parallel/bin/cut.pl -finder /opt/conda/bin/
# LTR_FINDER args=-w 2 -C -D 15000 -d 1000 -L 7000 -l 100 -p 20 -M 0.85
# LTR_FINDER_parallel version=v1.1
# predictions are reported in the following way
# s(ret) e(ret) l(ret) s(lLTR) e(lLTR) l(lLTR) s(rLTR) e(rLTR) l(rLTR) sim(LTRs) seq-nr chr
# where:
# s = starting position
# e = ending position
# l = length
# ret = LTR-retrotransposon
# lLTR = left LTR
# rLTR = right LTR
# sim = similarity
# seq-nr = sequence order
459 7041 6583 459 611 153 6888 7041 154 91.6 312 xyl_car_313
$ cat Xylographa_carneopallida.assembly.fa.mod.finder.combine.scn
#LTR_FINDER_parallel -seq Xylographa_carneopallida.assembly.fa.mod -size 1000000 -time 300 -try1 1 -harvest_out -threads 12 -cut /opt/conda/share/EDTA/bin/LTR_FINDER_parallel/bin/cut.pl -finder /opt/conda/bin/
# LTR_FINDER args=-w 2 -C -D 15000 -d 1000 -L 7000 -l 100 -p 20 -M 0.85
# LTR_FINDER_parallel version=v1.1
# predictions are reported in the following way
# s(ret) e(ret) l(ret) s(lLTR) e(lLTR) l(lLTR) s(rLTR) e(rLTR) l(rLTR) sim(LTRs) seq-nr chr
# where:
# s = starting position
# e = ending position
# l = length
# ret = LTR-retrotransposon
# lLTR = left LTR
# rLTR = right LTR
# sim = similarity
# seq-nr = sequence order
459 7041 6583 459 611 153 6888 7041 154 91.6 312 xyl_car_313
$ cat Xylographa_carneopallida.assembly.fa.mod.harvest.combine.scn
#LTR_HARVEST_parallel -seq Xylographa_carneopallida.assembly.fa.mod -size 1000000 -time 300 -try1 1 -threads 12 -cut /opt/conda/share/EDTA/bin/LTR_HARVEST_parallel/bin/cut.pl
# LTR_HARVEST args= -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes
# LTR_HARVEST_parallel version=v1.1
# predictions are reported in the following way
# s(ret) e(ret) l(ret) s(lLTR) e(lLTR) l(lLTR) s(rLTR) e(rLTR) l(rLTR) sim(LTRs) seq-nr chr
# where:
# s = starting position
# e = ending position
# l = length
# ret = LTR-retrotransposon
# lLTR = left LTR
# rLTR = right LTR
# sim = similarity
# seq-nr = sequence order
4836 8776 3941 4836 4952 117 8660 8776 117 86.32 82 xyl_car_83
120 5749 5630 120 245 126 5631 5749 119 89.68 87 xyl_car_88
1543 3032 1490 1543 1660 118 2917 3032 116 88.98 104 xyl_car_105
294 11494 11201 294 410 117 11377 11494 118 88.98 169 xyl_car_170
3655 4828 1174 3655 3779 125 4705 4828 124 92.00 196 xyl_car_197
11788 15060 3273 11788 11904 117 14943 15060 118 90.68 203 xyl_car_204
147 6896 6750 147 328 182 6715 6896 182 87.91 312 xyl_car_313

all the best, Philipp

oushujun commented 3 years ago

Please count non-# line number in the rawLTR.scn file and the number of false lines in the defalse file, If all candidates are false, that means no inract LTRs are found in your genome and you will need --force 1 to overwrite the missing of LTRs.

Shujun

On Sat, Apr 24, 2021 at 4:38 PM Philipp Resl @.***> wrote:

Hi, I checked and there seem to be false entries in the *defalse file.

$ cat Xylographa_carneopallida.assembly.fa.mod.defalse xyl_car_197:3655..4828 false motif:TTAA TSD:TTTG 3651..3654 4829..4832 IN:3780..4704 0.9590 ? unknownNA 1620996 Adjust: NO lLTR: 125 rLTR: 124 Alignment regions: 4, 125, 1051, 1173 LTR coordinates: 3655, 3779, 4705, 4828 TSD-LTR overlap: 0 Boundary missing: 0

xyl_car_105:1544..3032 false motif:GACC TSD:GTAAC 1539..1543 3033..3037 IN:1655..2922 0.9910 ? unknownNA 348599 Adjust: 5' rLTR lLTR: 112 rLTR: 111 Alignment regions: 1, 112, 1380, 1490 LTR coordinates: 1544, 1654, 2923, 3032 TSD-LTR overlap: 0 Boundary missing: 0

xyl_car_204:11788..15059 false motif:GGCC TSD:GTTAC 11783..11787 15060..15064 IN:11899..14948 1.0000 ? unknown NA 0 Adjust: 5' rLTR lLTR: 112 rLTR: 112 Alignment regions: 1, 112, 3162, 3273 LTR coordinates: 11788, 11898, 14949, 15059 TSD-LTR overlap: 0 Boundary missing: 0

xyl_car_83:4836..8775 false motif:GGCC TSD:GTAAC 4831..4835 8776..8780 IN:4952..8659 0.9901 ? unknownNA 383344 Adjust: NO lLTR: NA rLTR: NA Alignment regions: 11, 112, 3841, 3941 LTR coordinates: 4836, 4951, 8660, 8775 TSD-LTR overlap: 0 Boundary missing: 0

xyl_car_313:148..6896 truncated motif:AGAC TSD:TAGGA 143..147 6897..6901 IN:322..6726 0.9593 - Gypsy LTR 1609362 Adjust: 5' rLTR lLTR: 175 rLTR: 171 Alignment regions: 4, 175, 6578, 6750 LTR coordinates: 148, 321, 6727, 6896 TSD-LTR overlap: 0 Boundary missing: 0

xyl_car_313:459..7041 false motif:TGCA TSD:TGGG 455..458 7042..7045 IN:612..6887 0.9583 - Gypsy LTR 1648793 Adjust: NO lLTR: NA rLTR: NA Alignment regions: 6, 150, 6436, 6580 LTR coordinates: 459, 611, 6888, 7041 TSD-LTR overlap: 0 Boundary missing: 0

xyl_car_88:121..5749 false motif:GTCC TSD:CCGAG 116..120 5750..5754 IN:246..5631 0.9831 ? unknownNA 659370 Adjust: NO lLTR: NA rLTR: NA Alignment regions: 10, 127, 5512, 5629 LTR coordinates: 121, 245, 5632, 5749 TSD-LTR overlap: 0 Boundary missing: 0

xyl_car_170:294..11493 false motif:GGCC TSD:GTTAC 289..293 11494..11498 IN:405..11382 0.9821 ? unknownNA 695124 Adjust: 5' rLTR lLTR: 112 rLTR: 112 Alignment regions: 1, 112, 11090, 11201 LTR coordinates: 294, 404, 11383, 11493 TSD-LTR overlap: 0 Boundary missing: 0

The scn files look like so:

$ cat Xylographa_carneopallida.assembly.fa.mod.rawLTR.scn

LTR_HARVEST_parallel -seq Xylographa_carneopallida.assembly.fa.mod -size 1000000 -time 300 -try1 1 -threads 12 -cut /opt/conda/share/EDTA/bin/LTR_HARVEST_parallel/bin/cut.pl

LTR_HARVEST args= -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes

LTR_HARVEST_parallel version=v1.1

predictions are reported in the following way

s(ret) e(ret) l(ret) s(lLTR) e(lLTR) l(lLTR) s(rLTR) e(rLTR) l(rLTR) sim(LTRs) seq-nr chr

where:

s = starting position

e = ending position

l = length

ret = LTR-retrotransposon

lLTR = left LTR

rLTR = right LTR

sim = similarity

seq-nr = sequence order

4836 8776 3941 4836 4952 117 8660 8776 117 86.32 82 xyl_car_83 120 5749 5630 120 245 126 5631 5749 119 89.68 87 xyl_car_88 1543 3032 1490 1543 1660 118 2917 3032 116 88.98 104 xyl_car_105 294 11494 11201 294 410 117 11377 11494 118 88.98 169 xyl_car_170 3655 4828 1174 3655 3779 125 4705 4828 124 92.00 196 xyl_car_197 11788 15060 3273 11788 11904 117 14943 15060 118 90.68 203 xyl_car_204 147 6896 6750 147 328 182 6715 6896 182 87.91 312 xyl_car_313

LTR_FINDER_parallel -seq Xylographa_carneopallida.assembly.fa.mod -size 1000000 -time 300 -try1 1 -harvest_out -threads 12 -cut /opt/conda/share/EDTA/bin/LTR_FINDER_parallel/bin/cut.pl -finder /opt/conda/bin/

LTR_FINDER args=-w 2 -C -D 15000 -d 1000 -L 7000 -l 100 -p 20 -M 0.85

LTR_FINDER_parallel version=v1.1

predictions are reported in the following way

s(ret) e(ret) l(ret) s(lLTR) e(lLTR) l(lLTR) s(rLTR) e(rLTR) l(rLTR) sim(LTRs) seq-nr chr

where:

s = starting position

e = ending position

l = length

ret = LTR-retrotransposon

lLTR = left LTR

rLTR = right LTR

sim = similarity

seq-nr = sequence order

459 7041 6583 459 611 153 6888 7041 154 91.6 312 xyl_car_313

$ cat Xylographa_carneopallida.assembly.fa.mod.finder.combine.scn

LTR_FINDER_parallel -seq Xylographa_carneopallida.assembly.fa.mod -size 1000000 -time 300 -try1 1 -harvest_out -threads 12 -cut /opt/conda/share/EDTA/bin/LTR_FINDER_parallel/bin/cut.pl -finder /opt/conda/bin/

LTR_FINDER args=-w 2 -C -D 15000 -d 1000 -L 7000 -l 100 -p 20 -M 0.85

LTR_FINDER_parallel version=v1.1

predictions are reported in the following way

s(ret) e(ret) l(ret) s(lLTR) e(lLTR) l(lLTR) s(rLTR) e(rLTR) l(rLTR) sim(LTRs) seq-nr chr

where:

s = starting position

e = ending position

l = length

ret = LTR-retrotransposon

lLTR = left LTR

rLTR = right LTR

sim = similarity

seq-nr = sequence order

459 7041 6583 459 611 153 6888 7041 154 91.6 312 xyl_car_313

$ cat Xylographa_carneopallida.assembly.fa.mod.harvest.combine.scn

LTR_HARVEST_parallel -seq Xylographa_carneopallida.assembly.fa.mod -size 1000000 -time 300 -try1 1 -threads 12 -cut /opt/conda/share/EDTA/bin/LTR_HARVEST_parallel/bin/cut.pl

LTR_HARVEST args= -minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 -motif TGCA -motifmis 1 -similar 85 -vic 10 -seed 20 -seqids yes

LTR_HARVEST_parallel version=v1.1

predictions are reported in the following way

s(ret) e(ret) l(ret) s(lLTR) e(lLTR) l(lLTR) s(rLTR) e(rLTR) l(rLTR) sim(LTRs) seq-nr chr

where:

s = starting position

e = ending position

l = length

ret = LTR-retrotransposon

lLTR = left LTR

rLTR = right LTR

sim = similarity

seq-nr = sequence order

4836 8776 3941 4836 4952 117 8660 8776 117 86.32 82 xyl_car_83 120 5749 5630 120 245 126 5631 5749 119 89.68 87 xyl_car_88 1543 3032 1490 1543 1660 118 2917 3032 116 88.98 104 xyl_car_105 294 11494 11201 294 410 117 11377 11494 118 88.98 169 xyl_car_170 3655 4828 1174 3655 3779 125 4705 4828 124 92.00 196 xyl_car_197 11788 15060 3273 11788 11904 117 14943 15060 118 90.68 203 xyl_car_204 147 6896 6750 147 328 182 6715 6896 182 87.91 312 xyl_car_313

all the best, Philipp

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/188#issuecomment-826058614, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NFRDDC6WOAMP3GBWLLTKJ7Q5ANCNFSM43K52C6Q .

reslp commented 3 years ago

I just checked the files:

$ cat Xylographa_carneopallida.assembly.fa.mod.finder.combine.scn | grep -cv "#"
1
$ cat Xylographa_carneopallida.assembly.fa.mod.rawLTR.scn | grep -cv "#"
8
cat Xylographa_carneopallida.assembly.fa.mod.retriever.all.scn | grep -cv "#"
8
$ cat Xylographa_carneopallida.assembly.fa.mod.harvest.combine.scn | grep -cv "#"
7

This is the defalse file:

$ cat Xylographa_carneopallida.assembly.fa.mod.defalse | grep -c "false"
7

Not all the line counts are the same. Is this to be expected? The thing is these results were produced with using the force 1 flag. This was the used command:

EDTA.pl --genome results/Xylographa_carneopallida/Xylographa_carneopallida.assembly.fa --overwrite 1 --anno 1 --force 1  --threads 12 &> logs/Xylographa_carneopallida_edta.log
oushujun commented 3 years ago

Sorry for the delayed reply. It seems that this genome does not have detectable LTR retrotransposons. Does this make sense based on the biology of the species? You may use --force 1 to overcome this issue. To increase the annotation of TEs in this genome, you may also try the Repbase library.

Best, Shujun

On Sun, Apr 25, 2021 at 2:12 PM Philipp Resl @.***> wrote:

I just checked the files:

$ cat Xylographa_carneopallida.assembly.fa.mod.finder.combine.scn | grep -cv "#" 1 $ cat Xylographa_carneopallida.assembly.fa.mod.rawLTR.scn | grep -cv "#" 8 cat Xylographa_carneopallida.assembly.fa.mod.retriever.all.scn | grep -cv "#" 8 $ cat Xylographa_carneopallida.assembly.fa.mod.harvest.combine.scn | grep -cv "#" 7

This is the defalse file:

$ cat Xylographa_carneopallida.assembly.fa.mod.defalse | grep -c "false" 7

Not all the line counts are the same. Is this to be expected? The thing is these results were produced with using the force 1 flag. This was the used command:

EDTA.pl --genome results/Xylographa_carneopallida/Xylographa_carneopallida.assembly.fa --overwrite 1 --anno 1 --force 1 --threads 12 &> logs/Xylographa_carneopallida_edta.log

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/188#issuecomment-826264941, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NH5MYAVGO5AZJA2XK3TKOXEVANCNFSM43K52C6Q .

reslp commented 3 years ago

Hi Shujun,

Thank you for your reply. I already ran this genome with --force 1 (I posted the command I used in my previous posts) and it did not work to resolve this. Or do I have to specify in a different place? At this point I am not sure if this species has no detectable LTRs or if it is an artifact of the genome assembly. Is there a way to make EDTA fail "silently" so that it completes successfully although it could not identify LTRs?

I will look into trying the Repase library (I think I have an older version somewhere; it now needs a paid subscription).

If you have any additional ideas what I could try I would be very happy to hear them. Many many thanks so far!

best, Philipp

JackyHess commented 3 years ago

Hi Shujun and Philipp,

I have just run into the same issue with another fungal genome, a Basidiomycete, that I am reannotating to harmonize annotations across a larger set of species for comparison. I had annotated this genome using REPET a few years ago and LTRs were the largest class of repeats, so at least in my case this is likely a technical rather than a biological artifact.

I'm trying to hunt down some elements using my old annotations now to see if I can pinpoint where and why elements are not being picked up. I'm happy to share my results if it is useful.

Best wishes, Jacky

reslp commented 3 years ago

Hi Jacky,

Thanks for sharing! I also looked into my other annotations which included several closely related species to the one which fails and they also have plenty of LTRs. I therefore also doubt that the missing LTRs are real in my case. If I find a solution I will of course post it here.

best, Philipp

oushujun commented 3 years ago

Hi Jacky and Philipp,

Thanks for sharing your experiences. TE annotations in fungal genomes are less developed at the moment and we are looking at many unknown areas (but interesting). Even these error messages go away somehow, the fact that LTRs or other TE components are not found in some fungal genome is worth of a research topic. I can take a look at your data if you generously share them with me, but myself have not learned enough about fungal genomes so what I can do maybe limited.

Best, Shujun

On Fri, Apr 30, 2021 at 4:29 AM Philipp Resl @.***> wrote:

Hi Jacky,

Thanks for sharing! I also looked into my other annotations which included several closely related species to the one which fails and they also have plenty of LTRs. I therefore also doubt that the missing LTRs are real in my case. If I find a solution I will of course post it here.

best, Philipp

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/188#issuecomment-829574125, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NCAXFPIZ3X7YOK72ADTLG6SHANCNFSM43K52C6Q .

JackyHess commented 3 years ago

Hi both,

So I have been trying to reproduce the steps of the pipeline by running individual elements and I get stuck at the LTR_retriever step. I run into an error related to RepeatMasker that are also commonly found by other users (e.g. https://github.com/oushujun/LTR_retriever/issues/43), but the underlying issue seems to be different from the other ones I can find:

RepeatMasker -e ncbi -q -pa 1 -no_is -norna -nolow dummy060817.fa* -lib dummy060817.fa*

RepeatMasker version 4.1.0
Search Engine: NCBI/RMBLAST [ 2.10.0+ ]
Master RepeatMasker Database: /home/hessja/.conda/envs/TEs/share/RepeatMasker/Libraries/RepeatMaskerLib.embl ( Complete Database: CONS-Dfam_3.1 )
Custom Repeat Library: dummy060817.fa.246853

Building general libraries in: /home/hessja/.conda/envs/TEs/share/RepeatMasker/Libraries/CONS-Dfam_3.1/general
RepeatMasker::createLib(): Error invoking /home/hessja/.conda/envs/TEs/bin//makeblastdb on file /home/hessja/.conda/envs/TEs/share/RepeatMasker/Libraries/CONS-Dfam_3.1/general/l1.lib.

Both the makeblastdb exectutable and the library file are present and functional so I am unsure what may be the problem here - I guess this issue more for the LTR_retriever channel though.

Could a broken copy of LTR_retriever be what is causing the lack of validation of LTR candidates? I'm attaching my genome and EDTA output files here.

Thanks for your help!

Best wishes, Jacky

Abrun_EDTA.zip

oushujun commented 3 years ago

Hi Jacky and Philipp,

Sorry for the long delay. I modified the code so that it allows moving forward when any categories of intact TEs (ltr/tir/helitron) are not found and when the --force 1 parameter is specified. Now you should be able to proceed with --force 1.

I also checked the Abrun genome, LTR_retriever could not find any confident LTR elements in it, so I have to use --force 1 to proceed in EDTA. I also use --sensitive 1 to salvage any repetitive sequences missed in the structural step.

perl EDTA.pl --genome Abrun_assembly2.0.fasta.mod --species others -t 36 --force 1 --anno 1 --evaluate 1 --sensitive 1 &

Abrun_assembly2.0.tar.gz

This genome is not very repetitive (~18%), and a lot of them are unknown repeat region (9%). The evaluation result shows that the annotation is pretty consistent. So you may need to further check what those repeat regions are.

Best, Shujun

reslp commented 3 years ago

Hi Shujun,

thank you! This is great news. I will test this and report back.

all the best,

Philipp

oushujun commented 3 years ago

@reslp did it work? - Shujun

reslp commented 3 years ago

Hi Shujun,

sorry for my delayed reply. Yes it worked for all the genomes I have tested it with (83 fungal genomes). Many thanks for your help!

all the best,

Philipp

oushujun commented 3 years ago

@reslp That's great! I'll close the issue.

JackyHess commented 3 years ago

Hi Shujun,

Sorry for the delay! Just to add a quick thank you from my side for taking a look at the Amanita data as well - it's interesting that no LTRs are found.

Phylogenetic analysis of reverse transcriptases found in the genome support that LTRs are a very common type of TE in this clade of fungi, so I guess they either may be too fragmented to be picked up using structural detection or have unusual features.

Anyway, thanks again and all the best!

Jacky

oushujun commented 3 years ago

@JackyHess Thank you for sharing the biological background of this genome, which is absolutely some food for thought when looking at abnormal data. Wish you good luck in figuring this out, and hopefully, it's a good one. - Shujun