tangerzhang / ALLHiC

ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
174 stars 39 forks source link

Without Allele.ctg.table skipping ALLHiC_prune #3

Closed mictadlo closed 5 years ago

mictadlo commented 5 years ago

Hi @tanghaibao and @tangerzhang , I am working on denovo plant genome and I can't create Allele.ctg.table. In this case would the following work be correct?

PreprocessSAMs.pl sample.bwa_aln.sam draft.asm.fasta MBOI
filterBAM_forHiC.pl sample.bwa_aln.REduced.paired_only.bam sample.clean.sam  
samtools view -bt draft.asm.fasta.fai sample.clean.sam > sample.clean.bam 
ALLHiC_partition -b sample.clean.bam -r draft.asm.fasta -e AAGCTT -k 16
ALLHiC_rescue -b sample.clean.bam -r draft.asm.fasta -c clusters.txt -i counts_AAGCTT.txt
allhic extract sample.clean.bam draft.asm.fasta --RE AAGCTT
for K in {1..16};do allhic optimize group${K}.txt sample.clean.clm;done
ALLHiC_build draft.asm.fasta

Thank you in advance,

Michal

tangerzhang commented 5 years ago

Hi Michal, We provided a blast-based method to create the Allele.ctg.table in the github (https://github.com/tangerzhang/ALLHiC/wiki/ALLHiC:-identify-allelic-contigs). If you are not working on auto-polyploid genomes (or allo-polyploid genomes with very short divergence time between the haplotypes), you may not need the prune and rescue steps. If you indeed need to prune the bam file, please check the link above. This is an easy-use method to create the Allele.ctg.table. Surely, there are alternative ways to create this file. You may customize this file based on the format we provided in the same link.

tangerzhang commented 5 years ago

You can also skip ALLHiC_rescue step. But the group${K}.txt in optimize step should be replaced with sample.clean.counts_AAGCTT.*g${K}.txt which are generated after running ALLHiC_partition.

mictadlo commented 5 years ago

Hi, Thank you for your response. Do you think it would be possible to use GBS data to create Allele.ctg.table. We have one population consisted of 192 F3 lines resulting from a cross between two plants.

Thank you in advance,

Michal

tangerzhang commented 5 years ago

Hi Michal, GBS data can be used to construct genetic map rather than create the Allele.ctg.table. After you get the genetic map and Hi-C physical map, you can use ALLMAPS to integrate the two maps. Hope it is useful.