shendurelab / LACHESIS

The LACHESIS software, as described in Nature Biotechnology (http://dx.doi.org/10.1038/nbt.2727)
Other
76 stars 33 forks source link

Assemble bone fish genome #3

Closed mamonster closed 10 years ago

mamonster commented 10 years ago

Hi, Just finish reading your fabulous article on Nature Biotechnology, I couldn't help wondering if I can use LACHESIS to assemble the bone fish draft genome I am working on. However I found it hard to find any Hi-C data sets for any kind of fish. With such an emerging model organism, I guess the best shot I can get is using Zebrafish data set, but I couldn't find any Hi-C experiments done on Zebrafish either. Do You have any option/suggestion for the genome-wide chromatin interaction data set for assembling bone fish genome?

Thank You Very Much, by the way NICE name for the tool, Fates are my favorite goddess :)

JingaJenga commented 10 years ago

Thanks for your interest! That's a great question. To my knowledge, nobody has generated any Hi-C datasets for any fish species - or for any non-mammalian vertebrates, for that matter. If you want a dataset you'll have to create it in-house; here's a paper with the protocol you'll need: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3149993/

Cheers, -- Josh

mamonster commented 10 years ago

Dear Josh, Thanks for your kindly reply. I wonder how much budget and time would be need to perform a Hi-C in house? Would a MiSeq capable of doing that ?

mamonster commented 10 years ago

Also, is there any possible way to use linkage data other than Hi-C? I found many placed scaffolds on UCSC genome browser of fishes, would it be possible to format them into LACHESIS usable linkages? Or maybe some other data like RNA-Seq of the same organism? Thought it would be wonderful if such a great tool can works with wider range of data types....

JingaJenga commented 10 years ago

A MiSeq could definitely produce the sequence data necessary for a Hi-C run. You don't need particularly high coverage of Hi-C reads to run LACHESIS. For the human and mouse assemblies in the paper, the number of Hi-C read pairs used was only 20% the combined amount of short-insert and jumping read pairs - so if you can afford to produce the assembly in the first place, you can afford to scaffold it. The bigger problem is running the Hi-C wet lab protocol itself, which is admittedly not very simple.

Unfortunately the only data types you can reliably use with the method of LACHESIS are chromatin contacts. It doesn't have to be Hi-C specifically; chromatin-capture methods such as 5C or TCC (tethered chromatin capture) could work. But the 3-D structure of the genome is the signal that LACHESIS needs. I believe other tools exist that could help you, though. There are combined genome/transcriptome assemblers that will use both DNA sequencing and RNA-seq. An assisted assembly method could help you assemble your genome using other genomes as a reference, although you'd run the risk of gross misassemblies.