rpetit3 / dragonflye

:dragon: :fly: Assemble bacterial isolate genomes from Nanopore reads
GNU General Public License v3.0
117 stars 10 forks source link

Feature request: replicon rotation #22

Closed dfornika closed 8 months ago

dfornika commented 1 year ago

Would you be open to adding a replicon rotation feature similar to what unicycler does? There's an existing issue on the flye repo that states that it doesn't directly support rotation, and suggests using circlator for that purpose.

The unicycler paper describes its approach:

A circular sequence can be shifted to any starting position without changing the biological information. Unicycler therefore uses TBLASTN to search for dnaA or repA alleles in each completed replicon[20]. If one is found, the sequence is rotated and/or flipped so that it begins with that gene encoded on the forward strand. This provides consistently oriented assemblies and reduces the risk that a gene will be split across the start and end of the sequence.

...or do you see that as out-of-scope for dragonflye?

Thanks for making such a useful tool. It really simplifies the process of creating high-quality hybrid assemblies.

rpetit3 commented 1 year ago

Hi Dan,

Thanks for reaching out! I think this is definitely within the scope of Dragonflye. Maybe as a optional step.

I saw they recommend circulator, any other stand-alone tools come to mind for you? I know Trycycler has a step as well but it depends on the previous steps.

I can also look into this on my end

Robert

dfornika commented 1 year ago

Oh great, I'm glad to hear you're open to the idea. I'm not aware of any other stand-alone tools. I did notice that the circlator repo says it's no longer maintained, and honestly I haven't tried it out. We've had generally good experience with unicycler's plasmid rotation, but for overall quality and completeness of hybrid assemblies we've seen that dragonflye is giving better results. So if dragonflye could do the plasmid rotation step too, it would be the best of both worlds!

rpetit3 commented 1 year ago

This is the one that I couldn't remember, but found it again: https://github.com/gbouras13/dnaapler

This one might do the trick

dfornika commented 1 year ago

Looks great! I wasn't aware of dnaapler. One other feature that unicycler includes is to put the start-gene (repA) on the positive strand. We've also seen some plasmids coming out of our hybrid assemblies that are the reverse-complement of plasmids from other assemblies. I just took a quick look at dnaapler and didn't see if it supports that but I hope that could be included.

incoherentian commented 1 year ago

wasn't aware of dnaapler

Nor I! Looks like a great addition though. Agree that mirroring the Unicycler-implemented functionality in flye assemblies (preferably using same start IMO) would be great.

rpetit3 commented 1 year ago

Hi @dfornika,

I added support for dnaapler in the latest commit of Dragonflye. If you want to give it a try go for it, otherwise I can push a version release.

Cheers, Robert

dfornika commented 1 year ago

Hi @rpetit3 that's great, thanks for implementing that. I'm not sure how quickly I'll be able to set up a test run so please don't let me block you from pushing out a release if you're satisfied with the results you're seeing.

incoherentian commented 1 year ago

Thought I'd have a go testing from your dev channel using the new changelog version 1.2.0 (since I gave my two cents here) but this didn't work: mamba create -n dragonflye-dev202311 -c rpetit3 -c bioconda -c conda-forge dragonflye=1.2.0

When this hits the bactopia assembler, do you think you'll disable unicycler rotation and process both outputs with dnaapler for unified no_rotate = false or no_rotate = true params?

rpetit3 commented 1 year ago

Yo! @incoherentian

Easiest would be to build a dragonflye env with dnaapler, then download the latest commit from here.

mamba create -n dragonflye-dev202311 -c rpetit3 -c bioconda -c conda-forge dragonflye dnaapler
conda activate dragonflye-dev202311

# Download latest commit
wget https://raw.githubusercontent.com/rpetit3/dragonflye/main/bin/dragonflye

# Replace conda version with latest commit version
chmod dragonflye
mv dragonflye $(which dragonflye)

I'll have to think about Unicycler, most likely use dnaapler by default with an option to fall back on Unicyler's method, or the opposite.

incoherentian commented 1 year ago

Thanks for the mini-tutorial! Was not aware I could just download and chmod a change like that without confusing conda.

Dragonflye actively logs dnaapler progress after polypolish:

[...]
[dragonflye] Writing final assembly file => 'contigs.fa'
[dragonflye] Reorienting contigs with dnaapler
[dragonflye] Running: dnaapler all --input contigs.fa --output flye/reorient --threads 12 --prefix contigs --force  2>&1 | sed 's/^/[dnaapler] /' | tee -a dragonflye.log
[dnaapler] 2023-11-13 15:34:26.192 | INFO     | dnaapler.utils.validation:instantiate_dirs:23 - Checking the output directory flye/reorient
[dnaapler] 2023-11-13 15:34:26.205 | INFO     | dnaapler.utils.util:begin_dnaapler:71 - You are using dnaapler version 0.4.0
[...]

& dnaapler summary highlighting relevant useful info

$ cat /scratch/c.medib/bactopia_out/d57_jing_trimgalore_merged/JING1_merged_ont_dflye-dev_out_job64418956/contigs.dnaapler.summary.tsv
Contig  Gene_Reoriented Start   Strand  Top_Hit Top_Hit_Length  Covered_Length  Coverage        Identical_AAs   Identity_Percentage
contig00001 len=4685696 cov=37.0 origname=contig_1_polypolish polish=racon:1 round(s);medaka:2 round(s);polypolish:short_reads,1 round(s); sw=dragonflye-flye/1.2.0 date=20231113 circular=Y       dnaA    821920  forward sp|Q3YWB2|DNAA_SHISS    467     467     100.0   466     99.79
contig00002 len=252722 cov=34.0 origname=contig_2_polypolish polish=racon:1 round(s);medaka:2 round(s);polypolish:short_reads,1 round(s); sw=dragonflye-flye/1.2.0 date=20231113 circular=Y        repA    221900  reverse UniRef90_A0A077W3J7     291     291     100.0   290     99.66

Appears to be working perfectly :)

rpetit3 commented 8 months ago

Hi all!

I finally got around to a proper release, v1.2.0 is now available with this feature: https://github.com/rpetit3/dragonflye/releases/tag/v1.2.0

Going to close this for now, please feel free to reopen!

MrTomRod commented 4 months ago

Dear @rpetit3

I went through your code a bit and have a question: what happens if there is a linear contig? Does dragonflye use Dnaapler to reorient that, too?