Closed yt619 closed 8 months ago
Hi Xiaofei,
The file corrected_ctgs.txt is empty. Does this indicate that no contigs have been corrected?
Best regards, Tuo
This seems to be caused by --remove_allelic_links 4. What I have assembled is a segmental allopolyploid, and chromosomal exchanges have led to high similarity in some regions. Should I not filter out MAPQ 1 (mapping quality ≥ 1) from the BAM file?
Hi Tuo,
The file corrected_ctgs.txt is empty. Does this indicate that no contigs have been corrected?
Yes, you are correct.
Should I not filter out MAPQ 1 (mapping quality ≥ 1) from the BAM file?
Perhaps no. In my opinion, MAPQ >=1 is already considered a basic criterion.
This seems to be caused by --remove_allelic_links 4.
Yes. It appears that the removal of allelic Hi-C links unexpectedly resulted in an empty flank_link_dict
.
I suggest trying out the quick view mode first and showing me the Hi-C contact map in Juicebox. This will help me better understand the problem.
Best regards, Xiaofei
Hi Xiaofei,
Thanks for your answer to these question. This genome is a segmental allopolyploid. Due to chromosomal exchanges, there are numerous identical sequence regions between homologous chromosomes. These regions, being indistinguishable due to the shortness of Hi-C reads, result in a large number of reads with a mapping quality of MAPQ=0. Initial filtering tends to mask the signals in these areas, and the effective sequencing rate is only 75%. Could this be the reason why --remove_allelic_links 4 fails to output flank_link_dict? I am trying to solve this problem using Pore-C, which has an effectiveness rate of 91%. I noticed that you mentioned the use of Pore-C data in another issue; is it possible to use Pore-C data for genome assembly?
Best regards, Tuo
Hi Tuo,
Could this be the reason why --remove_allelic_links 4 fails to output flank_link_dict?
I'm not sure. --remove_allelic_links
only deals with allelic contig pairs that have diagonally distributed Hi-C links between them. This kind of distribution pattern can be observed in the second contact map you provided. However, it seems that these Hi-C links are absent in the first contact map. I am wondering if there are any differences in the mapping and filtering methods for the Hi-C data?
Another unexpected observation is that none of the contigs were filtered before clustering. Especially during the rank sum filtering, both Q1 and Q3 were calculated to be 120. I can reproduce this result when the BAM file is either empty or does not match the FASTA file. Therefore, I would suggest checking the input BAM file as a first step.
Best regards, Xiaofei
Hi Xiaofei,
When I run HapHiC, I encounter an error, which doesn't seem to be an issue with the program installation. Could you please guide me on how to solve this problem? I am combining the hap1 and hap2 outputs from the hifiasm software and assembling them at the chromosome level through Hi-C reads. I will upload my log. HapHiC_cluster.log
Best regards, Tuo