zengxiaofei / HapHiC

HapHiC: a fast, reference-independent, allele-aware scaffolding tool based on Hi-C data
https://www.nature.com/articles/s41477-024-01755-3
BSD 3-Clause "New" or "Revised" License
142 stars 10 forks source link

haphic refsort #66

Closed Pwuchn closed 2 months ago

Pwuchn commented 2 months ago

I'm getting an error message when I do a haphic_sort on a paf file after a wfmash comparison. haphic refsort /public/home/pwu/7751/asm/04.build/scaffolds.raw.agp /public/home/pwu/7751/asm/minimap/wf_asm_to_ref.paf > scaffolds.refsort.agp 2024-09-18 11:45:50 <HapHiC_refsort.py> [run] Program started, HapHiC version: 1.0.6 (update: 2024.09.10) 2024-09-18 11:45:50 <HapHiC_refsort.py> [run] Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] 2024-09-18 11:45:50 <HapHiC_refsort.py> [run] Command: /public/home/pwu/soft/HapHiC/scripts/HapHiC_refsort.py /public/home/pwu/7751/asm/04.build/scaffolds.raw.agp /public/home/pwu/7751/asm/minimap/wf_asm_to_ref.paf 2024-09-18 11:45:50 <HapHiC_refsort.py> [parse_agp] Parsing input AGP file... 2024-09-18 11:45:50 <HapHiC_refsort.py> [parse_paf] Parsing input PAF file... Traceback (most recent call last): File "/public/home/pwu/soft/HapHiC/scripts/HapHiC_refsort.py", line 301, in <module> main() File "/public/home/pwu/soft/HapHiC/scripts/HapHiC_refsort.py", line 297, in main run(args, log_file='HapHiC_refsort.log') File "/public/home/pwu/soft/HapHiC/scripts/HapHiC_refsort.py", line 288, in run group_ref_dict = parse_paf(args.paf, ctg_group_dict) File "/public/home/pwu/soft/HapHiC/scripts/HapHiC_refsort.py", line 90, in parse_paf if int(cols[11]) < 1: ValueError: invalid literal for int() with base 10: 'id:f:0.997632' Traceback (most recent call last): File "/public/home/pwu/soft/HapHiC/haphic", line 117, in <module> subprocess.run(commands, check=True) File "/public/home/pwu/micromamba/envs/haphic/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/public/home/pwu/soft/HapHiC/scripts/HapHiC_refsort.py', '/public/home/pwu/7751/asm/04.build/scaffolds.raw.agp', '/public/home/pwu/7751/asm/minimap/wf_asm_to_ref.paf']' returned non-zero exit status 1.

Commands for paf file generation: wfmash /public/home/pwu/ref/ref.fasta /public/home/pwu/7751/asm/asm.fa -m -n 1 -S 1 -t 8 | cut -f 1-6,8- > wf_asm_to_ref.paf

zengxiaofei commented 2 months ago

There might be errors in generating the PAF file. The error message indicates that a column before the ‘id’ tag is missing. I have reviewed the output PAF formats for all versions of wfmash from 0.16 to 0.21, and they all include an additional “Chromosome” column. Therefore, we converted the outputs to a minimap2-compatible format using cut -f 1-6,8-. I'm not sure whether your issue is due to using a different version of wfmash or if you performed cut -f 1-6,8- twice.

Pwuchn commented 2 months ago

Thanks for the answer, I'm using version 0.10 of wfmash, I'll try with the new version!