Closed ErickTong closed 7 months ago
Hi Erick,
bwa mem -t 36 -5SP asm.fa Hic_R1.fq.gz Hic_R1.fq.gz | samblaster | samtools view - -@ 24 -S -h -b -F 3340 -o HiC.bam
Maybe you used the same FASTQ file Hic_R1.fq.gz
for the paired-end Hi-C reads.
Best regards, Xiaofei
Oh, that was just my oversight. Thank you very much for the reminder. I will recheck the code and run the process again.
After manual curation in Juicebox,most homologous chromosome sets can be successfully mounted, except for a few sets where some chimeric contigs exist (as shown in the figure below). Can I improve the assembly results by adjusting parameters?
Additionally, here are my hifiasm commands and results.
hifiasm -o XH01_asm -t 36 --h1 XH01_all_hic_R1.fq.gz --h2 XH01_all_hic_R2.fq.gz XH01_a_ccs.fastq.gz XH01_b_ccs.fastq.gz
I would be very grateful if you could provide some help!
Hi Erick,
Based the Hi-C contact maps you provided, these strong inter-group signals can be attributed to collapsed contigs rather than chimeric contigs, as illustrated in Fig. 2a of our manuscript. This issue stems from the genome assembly process and may arise due to the presence of highly similar regions among haplotypes.
Best regards, Xiaofei
Hello, Dear Developer,
Thank you your effort on developing the HapHiC. 1.My commands are as follows:
contig=XX.hap1_hap2.fa
hicread1=/public/data/quercifolia-XX/HiC/Unknown_BU274-001H0001_good_fastp_1.fq.gz
hicread2=/public/data/quercifolia-XX/HiC/Unknown_BU274-001H0001_good_fastp_2.fq.gz
ln -s $contig asm.fa
ln -s $hicread1 read1_fq.gz
ln -s $hicread2 read2_fq.gz
data -R
source /public/home/mambaforge/bin/activate haphic
# (1) Align Hi-C data to the assembly, remove PCR duplicates and filter out secondary and supplementary alignments
# bwa index asm.fa
bwa mem -t 80 -5SP asm.fa read1_fq.gz read2_fq.gz | samblaster | samtools view - -@ 80 -S -h -b -F 3340 -o HiC.bam
# (2) Filter the alignments with MAPQ 1 (mapping quality ≥ 1) and NM 3 (edit distance < 3)
filter_bam HiC.bam 1 --nm 3 --threads 80 | samtools view - -b -@ 80 -o HiC.filtered.bam
source /public/home/off/mambaforge/bin/activate haphic
haphic pipeline asm.fa HiC.filtered.bam 18 --RE "AAGCTT" --Nx 100 --min_group_len 1
This is a diploid genome with a low heterozygosity rate, around 0.5%,2n=2x=18.The XX.hap1_hap2.fa come from:
cat quer.XX.asm.hic.hap1.p_ctg.fa quer.XX.asm.hic.hap2.p_ctg.fa > XX.hap1_hap2.fa
2.And my 01.cluster always erro.When running, the following log is displayed:
cat HapHiC_cluster.log
2024-07-29 17:58:04 <HapHiC_cluster.py> [run] Program started, HapHiC version: 1.0.3 (update: 2024.03.26)
2024-07-29 17:58:04 <HapHiC_cluster.py> [run] Python version: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
2024-07-29 17:58:04 <HapHiC_cluster.py> [detect_format] The file for Hi-C read alignments is detected as being in BAM format
2024-07-29 17:58:04 <HapHiC_cluster.py> [parse_fasta] Parsing input FASTA file...
2024-07-29 17:58:13 <HapHiC_cluster.py> [stat_fragments] Making some statistics of fragments (contigs / bins)
2024-07-29 17:58:13 <HapHiC_cluster.py> [stat_fragments] bin_size is calculated to be 1416101 bp
2024-07-29 17:58:15 <HapHiC_cluster.py> [parse_alignments] Parsing input alignments...
2024-07-29 17:58:35 <HapHiC_cluster.py> [output_pickle] Writing HT_link_dict to HT_links.pkl...
2024-07-29 17:58:36 <HapHiC_cluster.py> [output_clm] Writing clm_dict to paired_links.clm...
2024-07-29 17:58:37 <HapHiC_cluster.py> [filter_fragments] Filtering fragments...
2024-07-29 17:58:37 <HapHiC_cluster.py> [filter_fragments] [Nx filtering] 1333 fragments kept
2024-07-29 17:58:37 <HapHiC_cluster.py> [filter_fragments] [RE sites filtering] 105 fragments removed, 1228 fragments kept
2024-07-29 17:58:37 <HapHiC_cluster.py> [filter_fragments] [link density filtering] Parameter --density_lower 0.2X is set to "multiple" mode and equivalent to 0.6726384364820847 in "fraction" mode
2024-07-29 17:58:37 <HapHiC_cluster.py> [filter_fragments] [link density filtering] Parameter --density_upper 1.9X is set to "multiple" mode and equivalent to 0.9666123778501629 in "fraction" mode
2024-07-29 17:58:37 <HapHiC_cluster.py> [filter_fragments] [link density filtering] 867 fragments removed, 361 fragments kept
2024-07-29 17:58:37 <HapHiC_cluster.py> [filter_fragments] [rank sum filtering] Q1=3402.0, median=3978.0, Q3=4555.0, IQR=Q3-Q1=1153.0
2024-07-29 17:58:37 <HapHiC_cluster.py> [filter_fragments] [rank sum filtering] Parameter --rank_sum_upper 1.5X is set to "multiple" mode and equivalent to 0.9944598337950139 in "fraction" mode
2024-07-29 17:58:37 <HapHiC_cluster.py> [filter_fragments] [rank sum filtering] 2 fragments removed, 359 fragments kept
2024-07-29 17:58:37 <HapHiC_cluster.py> [output_pickle] Writing full_link_dict to full_links.pkl...
2024-07-29 17:58:37 <HapHiC_cluster.py> [run] Hi-C linking matrix was constructed in 32.92142605781555s
2024-07-29 17:58:37 <HapHiC_cluster.py> [run_mcl_clustering] Performing Markov clustering...
2024-07-29 17:58:39 <HapHiC_cluster.py> [mcl] The matrix has converged after 60 rounds of iterations (expansion: 2, inflation: 1.1, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:39 <HapHiC_cluster.py> [mcl] The matrix has converged after 32 rounds of iterations (expansion: 2, inflation: 1.2, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:39 <HapHiC_cluster.py> [mcl] The matrix has converged after 23 rounds of iterations (expansion: 2, inflation: 1.3, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:40 <HapHiC_cluster.py> [mcl] The matrix has converged after 18 rounds of iterations (expansion: 2, inflation: 1.4, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:40 <HapHiC_cluster.py> [mcl] The matrix has converged after 15 rounds of iterations (expansion: 2, inflation: 1.5, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:40 <HapHiC_cluster.py> [mcl] The matrix has converged after 13 rounds of iterations (expansion: 2, inflation: 1.6, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:40 <HapHiC_cluster.py> [mcl] The matrix has converged after 12 rounds of iterations (expansion: 2, inflation: 1.7, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:40 <HapHiC_cluster.py> [mcl] The matrix has converged after 10 rounds of iterations (expansion: 2, inflation: 1.8, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:40 <HapHiC_cluster.py> [mcl] The matrix has converged after 10 rounds of iterations (expansion: 2, inflation: 1.9, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 9 rounds of iterations (expansion: 2, inflation: 2.0, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 8 rounds of iterations (expansion: 2, inflation: 2.1, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 10 rounds of iterations (expansion: 2, inflation: 2.2, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 7 rounds of iterations (expansion: 2, inflation: 2.3, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 10 rounds of iterations (expansion: 2, inflation: 2.4, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 7 rounds of iterations (expansion: 2, inflation: 2.5, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 9 rounds of iterations (expansion: 2, inflation: 2.6, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 8 rounds of iterations (expansion: 2, inflation: 2.7, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 8 rounds of iterations (expansion: 2, inflation: 2.8, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 8 rounds of iterations (expansion: 2, inflation: 2.9, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [mcl] The matrix has converged after 8 rounds of iterations (expansion: 2, inflation: 3.0, maximum iterations: 200, pruning threshold: 0.0001)
2024-07-29 17:58:41 <HapHiC_cluster.py> [run_mcl_clustering] The maximum number of clusters (1) is even less than the expected number of chromosomes (18). You could try higher inflation.
2024-07-29 17:58:41 <HapHiC_cluster.py> [run] 20 round(s) of Markov clustering finished in 4.332892656326294s, average 0.2166446328163147s per round
2024-07-29 17:58:41 <HapHiC_cluster.py> [output_statistics] Making some statistics for the next HapHiC reassignment step...
2024-07-29 17:59:03 <HapHiC_cluster.py> [run] Program finished in 58.50718808174133s
The HiC.filtered.bam:
samtools view HiC.filtered.bam | cut -f 1-5 | head -20
LH00330:133:222K77LT4:3:1101:0:19472 81 h1tg000012l 2194749 8
LH00330:133:222K77LT4:3:1101:0:19472 161 h1tg000012l 2194704 8
LH00330:133:222K77LT4:3:1101:0:20479 81 h1tg000012l 1773068 60
LH00330:133:222K77LT4:3:1101:0:20479 161 h1tg000012l 1773006 60
LH00330:133:222K77LT4:3:1101:0:22016 97 h1tg000028l 9173384 8
LH00330:133:222K77LT4:3:1101:0:22016 145 h1tg000028l 9173551 9
LH00330:133:222K77LT4:3:1101:0:22341 81 h1tg000012l 5980285 31
LH00330:133:222K77LT4:3:1101:0:22341 161 h1tg000012l 5980155 39
LH00330:133:222K77LT4:3:1101:0:24216 97 h1tg000024l 4949822 60
LH00330:133:222K77LT4:3:1101:0:24216 145 h1tg000024l 4949948 60
LH00330:133:222K77LT4:3:1101:0:27048 81 h2tg000005l 6148297 8
LH00330:133:222K77LT4:3:1101:0:27048 161 h2tg000005l 6148247 8
LH00330:133:222K77LT4:3:1101:0:27837 97 h2tg000002l 31226689 8
LH00330:133:222K77LT4:3:1101:0:27837 145 h2tg000002l 31226820 8
LH00330:133:222K77LT4:3:1101:0:29062 81 h1tg000028l 4129513 21
LH00330:133:222K77LT4:3:1101:0:29062 161 h1tg000028l 4129420 28
LH00330:133:222K77LT4:3:1101:0:29686 81 h1tg000005l 8032493 1
LH00330:133:222K77LT4:3:1101:0:29686 161 h1tg000005l 8032493 2
LH00330:133:222K77LT4:3:1101:0:31360 81 h2tg000013l 11094454 21
LH00330:133:222K77LT4:3:1101:0:31360 161 h2tg000013l 11094235 60
4.asm.fa detail
#### base statistics ####
A: 248645956, T: 248880267, C: 133895942, G: 133272425, N: 0
GC%: 0.34937917764005627
#### contig statistics ####
Nx Number Length
N10 3 36168237
N20 5 35644004
N30 7 30857360
N40 11 18353511
N50 15 17326354
N60 19 16175814
N70 25 12363138
N80 31 10390621
N90 60 668910
longest contig: 38426148, shortest contig: 17553
total number: 877, total length: 764694590
#### scaffold statistics ####
Nx Number Length
N10 3 36168237
N20 5 35644004
N30 7 30857360
N40 11 18353511
N50 15 17326354
N60 19 16175814
N70 25 12363138
N80 31 10390621
N90 60 668910
longest scaffold: 38426148, shortest scaffold: 17553
total number: 877, total length: 764694590
#### gapless scaffold statistics ####
Nx Number Length
N10 3 36168237
N20 5 35644004
N30 7 30857360
N40 11 18353511
N50 15 17326354
N60 19 16175814
N70 25 12363138
N80 31 10390621
N90 60 668910
longest gapless scaffold: 38426148, shortest gapless scaffold: 17553
total number: 877, total length: 764694590
I tried to use hap1 or p_ctg.fa to run haphic pipeline,But in the first step 01.cluster reported an error that the number of clustering groups was less than the number of nchr. How should I adjust the parameters? Thank you again.
Please open a new issue.
Hello xiaofei, My commands are as follows
The result file HapHiC_cluster.log is HapHiC_cluster.log I can't find the "best" inflation recommendation in the log file HapHiC_cluster.log and The maximum clusters is 2 , however, it is a triploid organism with approximately 51 chromosomes. I would be very grateful if you could provide some help!