ncsa / NEAT

NEAT (NExt-generation Analysis Toolkit) simulates next-gen sequencing reads and can learn simulation parameters from real data.
Other
42 stars 13 forks source link

problem with trinuc.pickle.gz in gen-mut-model #121

Open nilchia opened 1 month ago

nilchia commented 1 month ago

Hello, @joshfactorial. Thanks for fixing the issue with the utilities. I am facing a new problem now :) when using gen-mut-model with the --outcounts argument and a trinucleotide counts file I get this error:

2024-07-07 20:42:46,413:INFO:neat.gen_mut_model.utils:Loading file: neat_gen-mut-model.trinuc.pickle.gz.
2024-07-07 20:42:46,413:ERROR:neat:gen-mut-model failed, see the traceback below
Traceback (most recent call last):
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/site-packages/neat/cli/cli.py", line 131, in main
    cmd(args)
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/site-packages/neat/cli/commands/gen_mut_model.py", line 91, in execute
    compute_mut_runner(arguments.reference, arguments.mutations, arguments.bed, arguments.outcounts,
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/site-packages/neat/gen_mut_model/runner.py", line 480, in compute_mut_runner
    runner(reference_index, vcf_to_process, vcf_columns, outcounts, show_trinuc, save_trinuc,
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/site-packages/neat/gen_mut_model/runner.py", line 96, in runner
    trinuc_ref_count, bed_track_length = count_trinucleotides(reference_index,
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/site-packages/neat/gen_mut_model/utils.py", line 229, in count_trinucleotides
    trinuc_ref_count = json.load(counts)
                       ^^^^^^^^^^^^^^^^^
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/json/__init__.py", line 293, in load
    return loads(fp.read(),
                 ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
ERROR: gen-mut-model failed, showing the last error
Traceback (most recent call last):
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/site-packages/neat/cli/cli.py", line 131, in main
    cmd(args)
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/site-packages/neat/cli/commands/gen_mut_model.py", line 91, in execute
    compute_mut_runner(arguments.reference, arguments.mutations, arguments.bed, arguments.outcounts,
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/site-packages/neat/gen_mut_model/runner.py", line 480, in compute_mut_runner
    runner(reference_index, vcf_to_process, vcf_columns, outcounts, show_trinuc, save_trinuc,
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/site-packages/neat/gen_mut_model/runner.py", line 96, in runner
    trinuc_ref_count, bed_track_length = count_trinucleotides(reference_index,
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/site-packages/neat/gen_mut_model/utils.py", line 229, in count_trinucleotides
    trinuc_ref_count = json.load(counts)
                       ^^^^^^^^^^^^^^^^^
  File "/home/nilchia/miniconda3/envs/neat/lib/python3.11/json/__init__.py", line 293, in load
    return loads(fp.read(),
                 ^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

here is the code I used: neat --log-name neat_gen-mut-model gen-mut-model H1N1.fa H1N1.vcf -o neat_gen-mut-model --save-trinuc and neat --log-name neat_gen-mut-model2 gen-mut-model H1N1.fa H1N1.vcf -o neat_gen-mut-model2 --outcounts neat_gen-mut-model.trinuc.pickle.gz

Would you please help me with that? Thanks

joshfactorial commented 1 month ago

I haven't looked at this section yet since redoing parts of the main code, I will put this first on my to-do list.

nilchia commented 1 month ago

Cool, Thanks a lot!

joshfactorial commented 1 month ago

Okay, looks like I had some test code in that file that was hijacking the trinuc count file. I removed that and rewired it so that it works correctly. I have these changes in develop and will push out a new version soon.

nilchia commented 1 month ago

That's great! Thanks @joshfactorial