rnajena / mycovista

Mycovista - M. bovis assembly pipeline
GNU General Public License v3.0
0 stars 0 forks source link

"assembly best practice" -- additional flye option and SPAdes trusted contigs option #10

Open hoelzer opened 4 years ago

hoelzer commented 4 years ago

@sandraTriebel see here a nice guide for Nanopore/ hybrid genome assembly:

https://achri.blogspot.com/2019/12/nanopore-bacterial-genome-assemblies.html?m=1

Whereas here the focus is fast execution using a GPU, the tools and pipeline are interesting and not so different from what you already implemented.

[1] flye --nano-raw barcode06.fastq --threads 8 --iterations 2 --plasmids -g 3m --out-dir barcode06 This is the flye command used here. Interesting: --iterations parameter that already seems to do some kind of polishing. Maybe we also want to have this.

[2] The other really interesting part in my eyes:

Not only using the short reads for polishing but instead, integrate them again into the assembly process while using the long-read-only assembly as a real backbone. For this, the author use SPAdes with the --trusted-contigs option and passes the long-read polished contigs as a trusted set of sequences. Then they use pilon for polishing the SPAdes result using the short reads. I think you also tried pilon at some point?

spades.py -o spades --trusted-contigs medaka/consensus.fasta -1 /path/to/illumina/sample_R1_001.fastq.gz  -2 /path/to/illumina/sample_R2_001.fastq.gz

The question is: do we really need this in our case? Or: how difficult would it be for you to also implement a SPAdes rule that uses the Nanopore assembly with the error-corrected short reads as an input? So that we can compare?

sandraTriebel commented 4 years ago

It would not be that complicated. SPAdes is already installed and I can write the rule with the command you mentioned above.

hoelzer commented 4 years ago

Ok that's great! Then please add the spades rule using the polished long reads and the error corrected short reads as an input

sandraTriebel commented 4 years ago

So, now we'll use the pipeline: flye (w/ default polishing - 1 iteration) -> 4x Racon LR -> 1x medaka LR -> 4x Racon SR -> SPAdes w/ trusted contig option

or flye with 2 iterations?

hoelzer commented 4 years ago

I would try flye with 2 Iterations. Let's see if this significantlly increases runtime.

Otherwise yes to your pipeline proposal. After the 4xraconSR step we already have our nice assemblies and can then check if the additional spades step does further improvement

sandraTriebel notifications@github.com schrieb am Mo., 16. Dez. 2019, 13:36:

So, now we'll use the pipeline: flye (w/ default polishing - 1 iteration) -> 4x Racon LR -> 1x medaka LR -> 4x Racon SR -> SPAdes w/ trusted contig option

or flye with 2 iterations?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/sandraTriebel/mycoplasma_bovis_assembly/issues/10?email_source=notifications&email_token=ADN2CZ3I67WGFTDL5INEBEDQY5747A5CNFSM4J23MHAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG6XGWQ#issuecomment-566063962, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADN2CZ5EPISUZRFZEJPSVS3QY5747ANCNFSM4J23MHAA .