Open ilyavs opened 3 years ago
Hi Ilya, which versions have you tried so far? What data type do you have and how fragmented is the assembly? From version 1.4.x, bubble similarity check via minimizers was replaced with alignments, while versions 1.5.x have different repeat annotations to save execution time.
Best regards, Robert
Hi, Version 1.3.0 produced a 2.8 Mbp staph aureus genome. I tried versions 1.4.0, 1.5.1 and 1.5.3 (all via the docker images on quay.io). These versions were unable to produce the 2.8 Mbp genome contig. The largest contig was around 1 Mbp. The data type is minion nanopore sequencing basecalled with guppy 4.2.2. The dataset has 3.6e8 bp in the fastq file. Best, Ilya.
The data set seems it has enough coverage and not too bad accuracy, not sure why the latter versions do not work as 1.3.0. You could try v1.6.0 from branch options (you can also try different k,w values). Sorry for my delayed reply.
Can you please elaborate on how the k and w values are expected to affect the assembly? When do you expect to have the next version released to bioconda? Thanks, Ilya.
I have create a new release, it will be picked up automatically by bioconda soon.
Regarding parameters, I think you can first try with k = 19. We have recently evaluated higher k values (up to 25) on Guppy 5 data, which has tendency to increase contiguity. Earlier Raven versions used (k, w) = (29, 9) (option --weaken, now removed) for HiFi data to improve assembly. I am not sure how it will affect Guppy 4.x datasets, but your dataset is quite small so you can try a couple of values around the default (k, w) = (15, 5).
Thank you for the new release and information. Version 1.6.0 assembled the complete 2.8 Mbp genome but failed to circularize the chromosome while version 1.3.0 assembled the complete 2.8 Mbp genome and circularized the chromosome. In version 1.6.0 increasing the k value resulted in shorter largest contig. In version 1.5.3 running with --weaken resulted in a 2.7 Mbp non circular largest contig. So for now, it seems that version 1.3.0 is still the best option for my data, although version 1.6.0 comes close.
Hello, I have been using raven for a while and recently I reran an assembly of the same bacterial data with a newer version of raven and got a more fragmented genome. With version 1.3.0 I got the complete bacterial genome in one contig. With any later version I got the genome more fragmented and with a smaller total assembly size. Is it possible to keep the improvements done in recent raven versions but restore the better contiguity observed in version 1.3.0? Sorry but I can't share the data. Thanks, Ilya.