koszullab / GRAAL

(check out instaGRAAL for a faster, updated program!) This program is from Marie-Nelly et al., Nature Communications, 2014 (High-quality genome assembly using chromosomal contact data), also Marie-Nelly et al., 2013, PhD thesis (https://www.theses.fr/2013PA066714)
https://research.pasteur.fr/fr/software/graal-software-for-genome-assembly-from-chromosome-contact-frequencies/
14 stars 9 forks source link

GRAAL for diplid assemblies? #15

Open dcopetti opened 6 years ago

dcopetti commented 6 years ago

Hi,

I wonder if GRAAL will fit my genome project. I have a plant genome assembly with the following features:

I wonder if GRAAL can use allelic variation to produce phased pseudochromosome sequences. By collinearity I am able to assign 80% of the sequence to chromosomes (of a closely-related species), but I have pairs of scaffolds at each locus. I would like to split the pairs in the two allelic genomes in a phased fashion. Would GRAAL work with this? Thanks,

Dario

baudrly commented 6 years ago

Hello,

I have indeed been using GRAAL with success to assemble diploid genomes. This depends on how heterozygous the assembly is. If chromosomes are too similar to each other there will be too many mapping issues and 3C-based assemblers are generally unable to distinguish reads mapping onto either member of a chromosome pair. But if the chromosomes can be distinguished I have found that GRAAL is relatively robust to these mapping issues and can separate two chromosomes from a pair, albeit with a noticeable pattern:

hetero_pattern

If that works for you don't hesitate to try it out and report any issue you may have found.

dcopetti commented 6 years ago

perfect, that image is what I was looking for! (if the two are allelic chromosomes/scffolds) do you have any recommendation on how to prepare the Hi-C data? any favorite protocol or method? Thanks for the feedback,

baudrly commented 6 years ago

Yes they are an allelic pair, this is a typical pattern I've found among such chromosomes. As for the generation of GRAAL-compatible contact maps, you may use HiC-Box (graphical interface based) or my own pipeline (command-line based). Or you may convert the data manually according to the format described in the readme if you already have some data at your disposal that's been processed by another Hi-C pipeline. In any case, the pipelines are by and large equivalent for reassembly purposes, so simply pick whatever is the most convenient for you.

Cheers

dcopetti commented 6 years ago

great, thanks. How about the wet lab part, any particular recommendation?

baudrly commented 6 years ago

I don't do experiments anymore and I am not familiar on protocol specifics for plants but here's a very handy and recent reference. I hope that helps.