oushujun / EDTA

Extensive de-novo TE Annotator
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1905-y
GNU General Public License v3.0
315 stars 70 forks source link

Helitron results file has 0bp! #399

Open Toffeeladd opened 8 months ago

Toffeeladd commented 8 months ago

Hi there, thank you for such a great software!

I am currently using EDTA to annotate TEs in two plant genomes assembled through PacBio hifi long reads. The repeat content of the two genomes estimated from redmask (https://github.com/nextgenusfs/redmask). I give some brief details of the genomes below:

Species 1 Genome size = 3.2gb Contigs = 13,882 Repeat content = 70.%

Species 2 Genome size = 2gb Contigs = 299 Repeat content = 68%

I have opted for the divide and conquer approach using EDTA_raw.pl (version=EDTA/2.0.1) due to time constraints on my HPC. It has so far worked well for finding TIR and LTR raw TEs in Species 2 however it is struggling to find Helitrons in both. I am inclined to believe that they exist in these genomes as closely related species from the same family (Rubiaceae) have them. Essentially I am unsure whether this is due to an error in helitron scanner or the sensitivity of the search. (I have relatively simple contig headers so I don't believe that to be the problem). It seems to fill some files in the directory but others are empty (the log file and empty files in respective directories are the same in both species so I have only provided one example below) Any help would be great thanks!

log_file:

Wed Oct 18 16:59:37 BST 2023 EDTA_raw: Check dependencies, prepare working directories.

Wed Oct 18 17:00:01 BST 2023 Start to find Helitron candidates.

Wed Oct 18 17:00:01 BST 2023 Identify Helitron candidates from scratch.

Error: Error while loading sequence perl make_bed_with_intact.pl EDTA.intact.fa > EDTA.intact.bed

Thu Oct 19 07:18:11 BST 2023 Warning: The Helitron result file has 0 bp!

Thu Oct 19 07:18:11 BST 2023 Execution of EDTA_raw.pl is finished!

Species2.fa.mod.EDTA.raw Directory:

4096 Oct 19 07:18 Helitron 4096 Oct 18 17:00 LTR 0 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.Helitron.intact.bed 0 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.Helitron.intact.fa 125 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.Helitron.intact.gff3 0 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.Helitron.raw.fa 4096 Oct 18 17:00 TIR

Helitron Directory:

62 Oct 18 17:00 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod -> ../../S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod 0 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.Helitron.intact.bed 0 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.Helitron.intact.fa 125 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.Helitron.intact.gff3 0 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.Helitron.raw.fa 20565256 Oct 18 21:42 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.draw.hel.fa 18463144 Oct 19 06:44 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.draw.rc.hel.fa 3115414 Oct 19 06:45 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.filtered.ext.fa 53965 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.filtered.ext.fa.cov0.9iden90.tabout 0 Oct 19 06:45 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.filtered.ext.fa.pass.fa 3095674 Oct 19 06:45 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.filtered.fa 0 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.filtered.fa.pass.fa 0 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.filtered.fa.pass.fa.dusted 0 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.filtered.fa.pass.fa.dusted.cleanup 0 Oct 19 07:18 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.filtered.fa.pass.fa.dusted.cln 62917 Oct 19 06:45 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.filtered.tabout 144237393 Oct 18 19:11 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.head 52260 Oct 18 21:42 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.pairends 36177959 Oct 19 06:45 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.raw.ext.fa 280115 Oct 19 06:45 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.raw.ext.list 144710515 Oct 18 23:29 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.rc.head 53142 Oct 19 06:44 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.rc.pairends 8971624 Oct 19 06:44 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.rc.tail 8898722 Oct 18 21:42 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.tail 20480 Oct 19 06:45 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.ndb 3596 Oct 19 06:45 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.not 16384 Oct 19 06:45 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.ntf 1200 Oct 19 06:45 S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.nto

oushujun commented 8 months ago

Looks like Helitronscanner works well on your genome, but somehow it does not produce the final filtering results after the HelitronScanner step. Try this in the Helitron folder: perl yourpath/EDTA/util/flanking_filter.pl -genome $genome -query $genome.HelitronScanner.filtered.ext.fa -miniden 90 -mincov 0.9 -maxct 5 -blastplus $blastplus -t $threads

Shujun

Toffeeladd commented 8 months ago

Hi Shujun, thank you for your quick response!

The script you provided has filled the following file which was previously empty: S.wilk_nuclear_4n.asm.bp.p_ctg.fa.uncont.filtered.fa.mod.HelitronScanner.filtered.ext.fa.pass.fa

How do I continue the pipeline following this step? Am I able to re-run EDTA_raw.pl with the --overwrite 0 flag?

oushujun commented 8 months ago

It seems that the execution in the Helitron component was interrupted. You may rerun EDTA_raw --type Helitron to redo this part. The Helitron folder will be rewritten even with --overwrite 0 because it's unfinished.

Shujun

Toffeeladd commented 7 months ago

Hi Shujun,

Sorry for the late response. It is still getting stuck/interrupted at the same point when I re-run it. Is there a way I can run the individual Helitron scripts separately (like the flanking_filter.pl) without running the EDTA_raw.pl script until I finish the pipeline? If this is possible what order should I run the scripts in?

Alternatively I am running EDTA -v2.0.1 I will try updating to version 2.1.0 to see if this fixes the issue.

Thankyou for your continued help!

Noah

oushujun commented 5 months ago

Hi Noah,

I am very sorry for the long delay. Since you can run the flanking_filter.pl script independently without an issue but repeating to get interrupted at the same point, this may be an issue related to your platform. If this is a system shared with other users at the same time, try to run it with less threads. If you continue to have the same problem, you may manually run the block of codes for the Helitron module in the EDTA_raw.pl script by starting from the flanking_filter.pl script. Testing EDTA in the small file will be also helpful.

Please let me know if you your issue continues. Shujun

Toffeeladd commented 5 months ago

Hi Shujun,

I managed to solve the issue by running it on a different cluster. Thank you for your help!

Noah