zengxiaofei / HapHiC

HapHiC: a fast, reference-independent, allele-aware scaffolding tool based on Hi-C data
https://www.nature.com/articles/s41477-024-01755-3
BSD 3-Clause "New" or "Revised" License
141 stars 10 forks source link

Error about "haphic pipeline p_utg HiC.filtered.bam nchrs --gfa p_utg.gfa". #59

Closed yangyyhh closed 2 months ago

yangyyhh commented 2 months ago

Dear zeng: I need to assemble a homozygous tetraploid and perform genotyping. I plan to use "work with hifisam". haphic pipeline p_utg HiC.filtered.bam nchrs --gfa p_utg.gfa. My commond: cat YZ4.hic.hap1.p_ctg.gfa YZ4.hic.hap2.p_ctg.gfa YZ4.hic.hap3.p_ctg.gfa YZ4.hic.hap4.p_ctg.gfa>allhaps.gfa python mydata/16_YZ/01_rawdata/fasta.py#Custom script to convert GFA files to FA files. bwa index allhaps.fa bwa mem -5SP -t 28 allhaps.fa ../YZ4_HiC-clean_1.fq.gz ../YZ4_HiC-clean_2.fq.gz | samblaster | samtools view - -@ 14 -S -h -b -F 3340 -o allhaps_HiC.bam /01_software/HapHiC-main/utils/filter_bam allhaps_HiC.bam 1 --nm 3 --threads 30 | samtools view - -b -@ 30 -o allhaps_HiC.filtered.bam /01_software/HapHiC-main/haphic pipeline YZ4.hic.p_utg.fa allhaps_HiC.filtered.bam 44 --gfa YZ4.hic.p_utg.gfa --RE "GATC" --remove_allelic_links 4 --threads 30 --processes 5 The Error happen:

/01_software/HapHiC-main/haphic pipeline YZ4.hic.p_utg.fa allhaps_HiC.filtered.bam 44 --gfa YZ4.hic.p_utg.gfa --RE "GATC" --remove_allelic_links 4 --threads 30 --processes 5 2024-08-31 13:48:49 [main] Pipeline started, HapHiC version: 1.0.5 (update: 2024.08.22) 2024-08-31 13:48:49 [main] Python version: 3.9.7 (default, Mar 8 2023, 17:00:06) [GCC 7.5.0] 2024-08-31 13:48:49 [main] Command: /01_software/HapHiC-main/scripts/HapHiC_pipeline.py YZ4.hic.p_utg.fa allhaps_HiC.filtered.bam 44 --gfa YZ4.hic.p_utg.gfa --RE GATC --remove_allelic_links 4 --threads 30 --processes 5 2024-08-31 13:48:49 [haphic_cluster] Step1: Execute preprocessing and Markov clustering for contigs... 2024-08-31 13:48:49 [run] Program started, HapHiC version: 1.0.5 (update: 2024.08.22) 2024-08-31 13:48:49 [run] Python version: 3.9.7 (default, Mar 8 2023, 17:00:06) [GCC 7.5.0] 2024-08-31 13:48:49 [run] Command: /01_software/HapHiC-main/scripts/HapHiC_pipeline.py YZ4.hic.p_utg.fa allhaps_HiC.filtered.bam 44 --gfa YZ4.hic.p_utg.gfa --RE GATC --remove_allelic_links 4 --threads 30 --processes 5 2024-08-31 13:48:49 [run] Module sparse_dot_mkl or Intel MKL is not correctly installed, HapHiC will be executed in dense matrix mode 2024-08-31 13:48:49 [detect_format] The file for Hi-C read alignments is detected as being in BAM format 2024-08-31 13:48:49 [parse_fasta] Parsing input FASTA file... 2024-08-31 13:49:08 [parse_gfa] Parsing input gfa file(s)... 2024-08-31 13:49:11 [stat_fragments] Making some statistics of fragments (contigs / bins) 2024-08-31 13:49:11 [stat_fragments] bin_size is calculated to be 1188831 bp 2024-08-31 13:49:16 [parse_alignments] Parsing input alignments... 2024-08-31 13:52:42 [output_pickle] Writing HT_link_dict to HT_links.pkl... 2024-08-31 13:52:42 [output_clm] Writing clm_dict to paired_links.clm... 2024-08-31 13:52:42 [filter_fragments] Filtering fragments... 2024-08-31 13:52:42 [filter_fragments] [Nx filtering] 1144 fragments kept 2024-08-31 13:52:42 [filter_fragments] [RE sites filtering] 0 fragments removed, 1144 fragments kept 2024-08-31 13:52:42 [filter_fragments] [link density filtering] Parameter --density_lower 0.2X is set to "multiple" mode and equivalent to 0.0 in "fraction" mode 2024-08-31 13:52:42 [filter_fragments] [link density filtering] Parameter --density_upper 1.9X is set to "multiple" mode and equivalent to 1.0 in "fraction" mode 2024-08-31 13:52:42 [filter_fragments] [link density filtering] 0 fragments removed, 1144 fragments kept 2024-08-31 13:52:42 [filter_fragments] [read depth filtering] Q1=17.0, median=17.0, Q3=18.0, IQR=Q3-Q1=1.0 2024-08-31 13:52:42 [filter_fragments] [read depth filtering] Parameter --read_depth_upper 1.5X is set to "multiple" mode and equivalent to 0.9458041958041958 in "fraction" mode 2024-08-31 13:52:42 [filter_fragments] [read depth filtering] 57 fragments removed, 1082 fragments kept 2024-08-31 13:52:43 [filter_fragments] [rank sum filtering] Q1=120.0, median=120.0, Q3=120.0, IQR=Q3-Q1=0.0 2024-08-31 13:52:43 [filter_fragments] [rank sum filtering] Parameter --rank_sum_upper 1.5X is set to "multiple" mode and equivalent to 1.0 in "fraction" mode 2024-08-31 13:52:43 [filter_fragments] [rank sum filtering] 0 fragments removed, 1082 fragments kept 2024-08-31 13:52:43 [remove_allelic_HiC_links] Removing Hi-C links between alleic contig pairs... 2024-08-31 13:52:45 [output_pickle] Writing full_link_dict to full_links.pkl... Traceback (most recent call last): File "/01_software/HapHiC-main/scripts/HapHiC_pipeline.py", line 532, in main() File "/01_software/HapHiC-main/scripts/HapHiC_pipeline.py", line 513, in main haphic_cluster(args) File "/01_software/HapHiC-main/scripts/HapHiC_pipeline.py", line 355, in haphic_cluster HapHiC_cluster.run(args, log_file=LOG_FILE) File "/01_software/HapHiC-main/scripts/HapHiC_cluster.py", line 2887, in run flank_link_matrix, frag_index_dict = dict_to_matrix( File "/01_software/HapHiC-main/scripts/HapHiC_cluster.py", line 289, in dict_to_matrix shape = len(frag_set) TypeError: object of type 'NoneType' has no len() Traceback (most recent call last): File "/01_software/HapHiC-main/haphic", line 117, in subprocess.run(commands, check=True) File "/01_software/lib/python3.9/subprocess.py", line 528, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/01_software/HapHiC-main/scripts/HapHiC_pipeline.py', 'YZ4.hic.p_utg.fa', 'allhaps_HiC.filtered.bam', '44', '--gfa', 'YZ4.hic.p_utg.gfa', '--RE', 'GATC', '--remove_allelic_links', '4', '--threads', '30', '--processes', '5']' returned non-zero exit status 1. How should this error be resolved? Hoping your answer!

Best wishes.

zengxiaofei commented 2 months ago

I have responded this question in #58.