tangerzhang / ALLHiC

ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
173 stars 39 forks source link

how to choose K-value for dikaryotic species #63

Closed YuanwenGuo closed 3 years ago

YuanwenGuo commented 3 years ago

Hello, We are working on a dikaryotic (presence of two haploid nuclei coexist in one cell) fungal species, which is difficult to apply HiC scaffolding because it usually will brings two haplotypes together in one scaffold. I got excited when coming cross your paper about allele-aware HiC scaffolder, and hopefully it will solve our issue.

A question popped up during partition step, our species has haploid chromosome number of 18, should I specify K-value as 18 in our case?

Thank you for the help!

Best, Yuanwen

tangerzhang commented 3 years ago

Hi Yuanwen, Idearly, the K value should be equal to the number of chromosomes in your target genome. If the haploid chromosome number is 18 (i.e. x=18), the total number of the target genome is 36. K should be no less than this value. When we were working on sugarcane genome, we found that increasing K value could benefit for haplotype phasing. We usually try different K values in the partition step. Our experience has been archived in github (https://github.com/tangerzhang/ALLHiC/wiki/ALLHiC:-scaffolding-an-auto-polyploid-sugarcane-genome) and I hope it will help you to solve the dikaryotic scaffolding.

YuanwenGuo commented 3 years ago

Thank you! I will try K=36 or above to see how it works.

Best, Yuanwen

YuanwenGuo commented 3 years ago

Hi Xingtan,

Thanks a lot for directing me to your sugarcane assembly pipeline, which improves our assembly a lot.

However, I run into some misjoin issues even if I tried to separate homologous groups. It looks like rescue step caused the issue, before it, prunning.clusters.txt shows correct group which differs different haplotypes into different groups, but when it came to rescue, the assembler will bring some different haplotypes to same group. Same case happened during globally rescue.

I tried to increase -m threshold to 100 to reduce the possibility, but I am not sure if that's the best way to deal with issue. Could you please provide me some suggestions?

Thank you! Yuanwen

tangerzhang commented 3 years ago

Hi @YuanwenGuo The rescue function is developed to increase anchoring rate. It happens that the rescue step may introduce chimeric scaffolds. You can increase -m threshold to avoid this situation.

YuanwenGuo commented 3 years ago

Thank you for the suggestions!

Best, Yuanwen