zengxiaofei / HapHiC

HapHiC: a fast, reference-independent, allele-aware scaffolding tool based on Hi-C data
https://www.nature.com/articles/s41477-024-01755-3
BSD 3-Clause "New" or "Revised" License
138 stars 10 forks source link

haphic refsort #87

Open lijphd168866 opened 1 week ago

lijphd168866 commented 1 week ago

Dear Teacher Zeng: I use the Run HapHiC scaffolding pipeline for building pseudomolecules,My script is as follows: ~/HapHiC/haphic pipeline \ ~/Musa_analysis/06_polish/01_rundir/genome.nextpolish.fasta \ ba.HiC.filtered.bam 11 \ --correct_nrounds 2 --threads 48 --processes 48

”genome.nextpolish.fasta “ is the " hifiasm. asm. hic. pc_ctg.fa" file that I assembled using hifiasm. I pruge (purge_rups) and polish (Nextpolish) from the hifiasm assembled hifiasm. asm. hic. pc_ctg.fa;

Then I want to use the reference genomes of closely related species to sort my genome, My script is as follows: minimap2 -x asm20 Mch.genomes.chrrenamed.fa \ ~/Musa_analysis/06_polish/01_rundir/genome.nextpolish.fasta \ --secondary=no -t 48 -o ./ba.asm_to_ref.paf

~/Musa_analysis/07_haphic/04.build/scaffolds.raw.agp ./ba.asm_to_ref.paf

./ba.scaffolds.refsort.agp

Then I used the generated 'ba. scawfolds. refsort. agp' file to modify juicebox.sh:

ln -s /home/lijia/Musa_analysis/06_polish/01_rundir/genome.nextpolish.fasta . samtools faidx genome.nextpolish.fasta /home/lijia/biosoft/HapHiC/scripts/../utils/juicer pre -a -q 1 -o out_JBAT_ref /home/lijia/Musa_analysis/07_haphic/ba.HiC.filtered.bam /home/lijia/Musa_analysis/07_haphic/05.refsort/ba.scaffolds.refsort.agp genome.nextpolish.fasta.fai >out_JBAT_ref.log 2>&1 (java -jar -Xmx32G /home/lijia/biosoft/HapHiC/scripts/../utils/juicer_tools.1.9.9_jcuda.0.8.jar pre out_JBAT_ref.txt out_JBAT_ref.hic.part <(cat out_JBAT_ref.log | grep PRE_C_SIZE | awk '{print $2" "$3}')) && (mv out_JBAT_ref.hic.part out_JBAT_ref.hic)

and run juicebox.sh generate the .assembly and .hic files;

I haven't adjusted much in juicebox. I generate the final FASTA file for the scaffolds:

~/HapHiC/utils/juicer post -o ba.out_JBAT out_JBAT_ref.review.assembly out_JBAT_ref.liftover.agp genome.nextpolish.fasta;

Finally, I discovered that the genome was not sorted according to the reference chromosome; 微信图片_20241031101901

Could you give me some advice? Looking forward to your reply very much!

zengxiaofei commented 4 days ago

Hello @lijphd168866,

Which version of HapHiC are you using? If you are using an older version of HapHiC, please update to the new version, as I fixed an important bug on July 17th. Additionally, you can check the log file output when running haphic refsort. It will show the correspondence between scaffold (group) names and reference genome chromosomes. haphic refsort will regenerate the AGP file based on the order and orientation of the reference genome chromosomes, but it will not rename scaffolds. Therefore, you still need to rename them based on this correspondence. Lastly, the region you pointed out in your figure will not be processed by haphic refsort because it is within a scaffold. Please note that we have emphasized it in our documentation:

This function is NOT reference-based scaffolding and will NOT alter your scaffolds, it only changes the way of presentation through overall ordering and orientation of the entire scaffolds.

Best regards, Xiaofei

lijphd168866 commented 2 days ago

Hello @lijphd168866,

Which version of HapHiC are you using? If you are using an older version of HapHiC, please update to the new version, as I fixed an important bug on July 17th. Additionally, you can check the log file output when running haphic refsort. It will show the correspondence between scaffold (group) names and reference genome chromosomes. haphic refsort will regenerate the AGP file based on the order and orientation of the reference genome chromosomes, but it will not rename scaffolds. Therefore, you still need to rename them based on this correspondence. Lastly, the region you pointed out in your figure will not be processed by haphic refsort because it is within a scaffold. Please note that we have emphasized it in our documentation:

This function is NOT reference-based scaffolding and will NOT alter your scaffolds, it only changes the way of presentation through overall ordering and orientation of the entire scaffolds.

Best regards, Xiaofei

Hello, Teacher Zeng: I'm glad you replied so quickly. The version of haphic refsort I'm using is HapHiC version: 1.0.6 (update: 2024.09.10).Then I checked the contents of log file output haphic refsort: image However, the correspondence between the scaffold (group) name and the reference genome chromosome displayed in haphic refsort is inconsistent with that displayed in mummer. My mummer result is as follows: f64d7b7e6440ecbb07c25022666758c Looking forward to your reply very much!

zengxiaofei commented 2 days ago

I can see a clear "one group to one chromosome" pattern in the log, so I believe the result should be good. However, I am unsure how you converted the group names (e.g., group1, group2) to scaffold names (e.g., scaffold1, scaffold2). It is possible that the names do not correspond directly based on their numbers. For example, did you adjust the ordering of scaffolds in Juicebox? To determine whether the issue lies with haphic refsort or is caused by your adjustment in Juicebox, it is advisable to re-run the MUMmer alignment using the scaffolds.fa from the 04.build directory.