Closed joanmarticarreras closed 4 years ago
Hi Joan,
it is not odd that the size of the contig shrinks through iterations, but here it looks like it continues to shrink quite a bit. It might be due to the trimming heuristic at the end, which removes bases from both sides of each consensus window until the base coverage hits half the number of reads inside the window. You can turn it off with option --no-trimming
in Racon, but to disable this in Raven you have to change true
to false
here, and then recompile.
On the other hand, I think 2 iterations are sufficient for consensus, after which you can just run Medaka to reach higher accuracy.
Best regards, Robert
Thanks Robert for replying so fast. I will give it a try.
Thanks for the tip. Still, you can see how depending on which set of reads I start using, the length varies a lot as well. Any ideas?
Joan
What is the contig length when you have 0 Racon iterations with >5kbp reads? This is probably due to different reads constituting the layout sequence.
Using >Q12 >10kb reads:
Racon Repeats | Target Contig length (bp) |
---|---|
0 | 132197 |
2 | 133370 |
5 | 133169 |
10 | 133234 |
20 | 133132 |
30 | 133161 |
Using >Q12 >5kb reads: | Racon Repeats | Target Contig length (bp) |
---|---|---|
0 | 129666 | |
2 | 130134 | |
5 | 130343 | |
10 | 130405 | |
20 | 130346 | |
30 | 130532 |
After recompiling, using --no-trimming
in Racon, the target contig size is a bit more stable, especially for the > 5 kb dataset. Will this be a better estimation of the genome or would you recommend otherwise and trim the ends? I am afraid that 5 kb might be too short for a reliable estimation of the tandem repeats too. How sensible is Raven and Racon to tandem repeats?
Joan
For this dataset, I think I would use the --no-trimming
option. You can check out https://github.com/isovic/racon/issues/126, where a similar discussion took place (some had problems with telomeres, while a user was dealing with viruses). I am not sure how sensible Racon is with tandem repeats, I suppose the longer the average read length is the better.
Sorry for not replying earlier! Best regards, Robert
Hi!
First, congrats for your work on Ra and Raven. I've been following and testing your work for nanopore reads for quite a while.
I work in viral genomics (dsDNA viruses), studying new viruses, making reference genomes, their diversity, repeat distribution, etc.
I've been testing Raven for quite a while, and, by comparison to the rest, it does a great job at assembling this type of data! However, I realized that the end-contig size is quite sensible to size pre-filtering and to racon iterations. Filtering to intermediate sizes (>5-10 kb) yields almost perfect contiguity. Accepting shorter sequences adds to much diversity (especially in the repeats) and the contiguity drops. If filter is higher, there is no sufficient data to close the genome.
Interestingly though, increasing numbers of Racon iterations, tends to shrink the target contig. The size is not known but it is though to believe to be between 132-150 kbp (experimental data from the '80). Around 110 - 120 kb should be unique and then tandem repeats of 1,5 kb x15 - x20 times.
Here some data: Raven v.1.1.10, nanopore reads filtered at >Q12 + >10kbp:
Here some data: Raven v.1.1.10, Nanopore reads filtered at >Q12 + >5kbp:
What do you think might be the phenomenon behind it?
Joan