tangerzhang / ALLHiC

ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
174 stars 39 forks source link

inquiries about comparison of output genome assemblies from ALLHiC and SALSA2 #119

Open distilledchild opened 2 years ago

distilledchild commented 2 years ago

Hi, I tested ALLHiC and SALSA2 with a diploid genome assembly(19 + XY) that is built from linked-reads (Pseudohap 1 style) and shows around 35kb of contig N50. and I found that the outputs are way different in scaffold N50, 142.13 Mbp (from K = 22) and 35.56 Mbp, respectivly. Even reference's is 140.99 Mbp.

  1. What else can I check for the comparison except for the contiguity? That's because it's not reasonable to think on the contiguity for the comparison of the two tools.
  2. Do you have any ideas on what make this huge differences? It's so different in the result stats that I am confused.
  3. Would it be good to check in genome-genome dotplot, ( https://github.com/dnanexus/dot)?
  4. I am curious how I could check the tools with the same datasets for assembly genomes I have.

Thank you.

tangerzhang commented 2 years ago

Hi @theshowmustgolangon You can use dotplot and Hi-C heatmap to assess the quality of Hi-C scaffolding.