phasegenomics / juicebox_scripts

A collection of scripts for working with Hi-C data, Juicebox, and other genomic file formats
GNU Affero General Public License v3.0
63 stars 10 forks source link

Cannot generate output files with juicebox_assembly_converter.py #32

Open mixiaoluo88 opened 2 years ago

mixiaoluo88 commented 2 years ago

Hi, I tried to use the juicebox_assembly_converter.py script to generate the fasta file, but there seems to be something wrong in the assembly file to prevent this process.

The assembly file was generated from Juicerbox_1.11.08.

here is the command line: python juicebox_assembly_converter.py -a new_assembly_file -f cx_geno.fasta

Processing assembly file. Details:
Assembly:                       new_assembly_file
Fasta:                         cx_geno.fasta
Output prefix:                  new_assembly_file
Contig mode:                    False
Simple Chromosome Names:        False

Reading sequences from cx_geno.fasta...
.
Sequences read

Reading .assembly file new_assembly_file...
.assembly read

Traceback (most recent call last):
  File "/home/20092008/software/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 872, in <module>
    processed_assembly = JuiceboxConverter().process(fasta, assembly,
  File "/home/20092008/software/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 91, in process
    sequences = self._add_breaks(sequences, assembly_map)
  File "/home/20092008/software/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 255, in _add_breaks
    assembly_map.sort(key=functools.cmp_to_key(cmp_assembly_map_entries))
  File "/home/20092008/software/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 331, in cmp_assembly_map_entries
    names1 = extract_contig_info(frag1[0])
  File "/home/20092008/software/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 368, in extract_contig_info
    names["index"] = int(frag_fiel
[cx_geno_hic.final.review.assembly.txt](https://github.com/phasegenomics/juicebox_scripts/files/7834628/cx_geno_hic.final.review.assembly.txt)
ds[-1].replace("fragment_", ""))
ValueError: invalid literal for int() with base 10: '122:prefix'

cx_geno_hic.final.review.assembly.txt

ghost commented 1 year ago

I have the similar error no matter how I try to run this script:

Checking for breaks listed in .assembly and making them...
Traceback (most recent call last):
  File "/home/yanyang_liang/Accessories/Softwares/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 875, in <module>
    simple_chr_names=simple_chr_names)
  File "/home/yanyang_liang/Accessories/Softwares/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 91, in process
    sequences = self._add_breaks(sequences, assembly_map)
  File "/home/yanyang_liang/Accessories/Softwares/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 271, in _add_breaks
    raise ContigNotFoundError('Could not find contig {0} in original FASTA'.format(fragment))
__main__.ContigNotFoundError: Could not find contig ('h1tg000001l:::fragment_1', '13505878') in original FASTA
pengzhen688 commented 11 months ago

I met the same problem,I guess the script need some fix. Traceback(most recent call last): File "/ds3200_1/users_root/pengzhen/software/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 872, in processed_assembly = JuiceboxConverter().process(fasta, assembly, File "/ds3200_1/users_root/pengzhen/software/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 91, in process sequences = self._add_breaks(sequences, assembly_map) File "/ds3200_1/users_root/pengzhen/software/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py", line 300, in _add_breaks raise BadContigNameError("Unbroken contig {} failed to map!!".format(fragment_name)) main.BadContigNameError: Unbroken contig h2tg000001l failed to map!!

mankiddyman commented 11 months ago

@pengzhen688 Same error as you, I used hic-scaffolding-nf to generate my .hic and .assembly files successfully and then modified them in Juicebox also successfully. When I tried to convert the new final.assembly to .fasta I get the error: image

pengzhen688 commented 11 months ago

@pengzhen688 Same error as you, I used hic-scaffolding-nf to generate my .hic and .assembly files successfully and then modified them in Juicebox also successfully. When I tried to convert the new final.assembly to .fasta I get the error: image

I just solve this problem, the fasta file with -f parameter should be the draft assemble file, as to me, it's the file after Hifiasm, it works but I still confused of the results are already chromosomal level, it couldn't name as chr but scaffold. Attach my scripts and results. /ds3200_1/users_root/pengzhen/software/juicebox_scripts/juicebox_scripts/juicebox_assembly_converter.py -a female.hap1.review.assembly -f female.hic.hap1.p_ctg.fa part of my results sequence title: PGA_scaffold_138_contigs__length_104764504 104764504 PGA_scaffold_21_contigslength_85385853 85385853 PGA_scaffold_3__1_contigslength_152023637 152023637 PGA_scaffold_41_contigs__length_3406 3406 PGA_scaffold_51_contigslength_75600790 75600790 PGA_scaffold_6__2_contigslength_111463799 111463799 PGA_scaffold_71_contigs__length_118074989 118074989 PGA_scaffold_81_contigslength_111544484 111544484 PGA_scaffold_9__1_contigslength_114164565 114164565 PGA_scaffold_103_contigs__length_551076 551076 PGA_scaffold_112_contigslength_254286 254286 PGA_scaffold_12__10_contigslength_95934091 95934091 PGA_scaffold_139_contigs__length_1739813 1739813 PGA_scaffold_141_contigslength_86811578 86811578 PGA_scaffold_15__8_contigslength_80280840 80280840 PGA_scaffold_161_contigs__length_889487 889487 PGA_scaffold_176_contigslength_2478057 2478057 PGA_scaffold_18__5_contigslength_1643997 1643997 PGA_scaffold_193_contigs__length_1392244 1392244 PGA_scaffold_202_contigslength_553579 553579 PGA_scaffold_21__1_contigslength_113850 113850 PGA_scaffold_221_contigs__length_11221 11221 PGA_scaffold_231_contigslength_30543 30543 PGA_scaffold_24__1_contigslength_59000 59000 PGA_scaffold_251_contigs__length_78129 78129 PGA_scaffold_261_contigslength_154235 154235 PGA_scaffold_27__1_contigslength_307851 307851 PGA_scaffold_281_contigs__length_999 999 PGA_scaffold_291_contigslength_181554 181554 PGA_scaffold_30__1_contigslength_31656 31656 PGA_scaffold_31__1_contigs__length_181036 181036