tangerzhang / ALLHiC

ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
174 stars 39 forks source link

Questions of multiple libraries and no closed chromosome-scale assembly #27

Closed yangxiaofeill closed 5 years ago

yangxiaofeill commented 5 years ago

Dear Dr. Zhang,

This is Xiaofei Yang, from Xi'an Jiaotong University. Thanks for your tool ALLHiC, first. I try to use it to correct our draft assembly. We have sequenced hic of six libraries, should I merge the results (bam file after bwa mem alignment by samtools merge) to get a bam file before PreprocessSAMs.pl? Or do you have other solutions?

Another question is our genome is a new genome, so we cannot find a closed chromosome-scale assembly, how to do the prune step?

Best Xiaofei

tangerzhang commented 5 years ago

Hi Xiaofei, For the first question, you do not need to merge the bam files before running PreprocessSAMs.pl. You can clean the bam files individually and merge all the clean bam files into one single bam before ALLHiC step. If your genome is a diploid genome, please skip prune and rescue steps. The two functions are only designed for polyploid genomes or a heterozygous diploid genome when you want to phase two haplotypes. If you are working on a polyploid genome and no closely related reference available, the best way is to construct a monoploid assembly for the polyploid genome by collapsing the heterozygous allelic sequences as much as possible. If so, I will recommend wtdbg2 program to assemble contigs and use redundancy program to remove heterozygous sequences.

yangxiaofeill commented 5 years ago

Thanks. Our genome is a diploid genome. If I have other questions, I will contact you.

Best Xiaofei