mcveanlab / mccortex

De novo genome assembly and multisample variant calling
https://github.com/mcveanlab/mccortex/wiki
MIT License
113 stars 25 forks source link

2-color cortex? #43

Open yannickwurm opened 7 years ago

yannickwurm commented 7 years ago

Hello, as discussed briefly with Zam, it would be pretty neat if cortex could handle two layers of colors:

iqbal-lab commented 7 years ago

And to follow up @yannickwurm - Mccortex already does this (not sure if I was clear) - it allows longer range paths to be stored. What it does not have is a way to use colour 2 to improve the assembly of colour1. Anyway - best person to ask is @noporpoise

noporpoise commented 7 years ago

Hi @yannickwurm,

Yes this should be possible. Such information would be most useful when you already have a long contig and you are extending it. You'd need to assemble for a while to figure out which 10X genomics fragment(s) you're actually on. It might be easier to use map 10X genomics reads onto contigs for scaffolding after assembly. There was an interesting paper from Serafim Batzoglou's group on improving mapping by using 10X-style information[1].

To add the information into the graph, a multicolour approach would work but you'd need a new colour per sample which would be memory intensive. A low memory implementation could use a bloom filter to store unitig => fragment membership. It would also require a statistical model to make junction choices. I'm afraid it's not something I could do at the moment. Certainly an interesting idea though.

[1] Read clouds uncover variation in complex regions of the human genome

winni2k commented 6 years ago

This sounds like something that might be easy to implemented in CortexJDK?