shendurelab / LACHESIS

The LACHESIS software, as described in Nature Biotechnology (http://dx.doi.org/10.1038/nbt.2727)
Other
76 stars 33 forks source link

How many reads does it take to produce a Hi-C data set good enough for LACHESIS? #4

Closed mamonster closed 10 years ago

mamonster commented 10 years ago

Hi, I notice that in the manuscript, you sequenced 100 million reads for the scaffolding of human genome, which is extremely high and expensive. I wonder what's the lowest required depth (least amount of sequences reads needed) for it? Also I guess the amount of reads needed would decrease for genome with smaller size? For example if human genome (~3Gb) need 100 millions, a bony fish genome(~1.2Gb) would only need 40 millions, is it right? Just trying to lower the budget threshold for HI-C. :)

JingaJenga commented 10 years ago

As shown in Supplementary Table 6 of the paper, we actually used a total of 734 M read pairs in our human genome assembly. We demonstrated that this coverage is not strictly necessary for high-quality scaffolding, but it does help. The contiguity of your pre-Lachesis assembly also has a large impact on how much Hi-C coverage is necessary for high-quality scaffolding.

You're correct that the number of Hi-C read pairs necessary should scale roughly linearly with the genome size. Also keep in mind that your reads don't need to be very long - just long enough for unambiguous mapping, somewhere in the range of 50 bp.

-- Josh