tangerzhang / ALLHiC

ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
174 stars 39 forks source link

the problem of partiition #95

Closed phil622 closed 1 year ago

phil622 commented 3 years ago

Hi, I skip the prunning part and when i use this command: ALLHiC_partition -b sample.clean.bam -r draft.asm.fasta -e GATC -k 9 the result give me 235 groups. I only set up 9 groups ,why it give me so many groups?

tangerzhang commented 3 years ago

Hi @phil622 It happens when the Hi-C is far from high-quality. Did you use HiC-Pro to check the unique mapping and valid rate?

phil622 commented 3 years ago

not yet. so i should check my Hi-C data throuth HiC-Pro first?

tangerzhang commented 3 years ago

Hi @phil622 Yes, I think the hic quality is likely an issue resulting in this situation.

phil622 commented 3 years ago

hello I have test my data through HiC-Pro, and the mapping rate is 80% almost, so i think my data is high-quality now i am thinking if it is the reason for this problem that i skip the pruning part, my data is a Allotetraploid thank you for your help

phil622 commented 3 years ago

but my reported-pairs rate only 34.8%, so i don't know if my data is high-quality.

tangerzhang commented 3 years ago

How about the valid rate?

phil622 commented 3 years ago

the valid pairs rate are 70%, but the cis long rate are only 20%

phil622 commented 3 years ago

Now, I also do the pruning part, but it still have the same problem in the partition part. How can i do for that. Thanks.

phil622 commented 3 years ago

There are my HiC-Pro results. 1627963962 1627963976(1) 1627963989(1)

tangerzhang commented 3 years ago

Hi @phil622 Even though the valid rate is 70%, I still have doubt that the valid reads are not enough to link the contigs into chromosomes. I would suggest to check the coverage of the Hi-C sequencing data and also how many reads are left for scaffolding after a series of filtering steps.