oxfordmmm / gnomonicus

Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variants
Other
5 stars 0 forks source link

fails writing JSON data block with Object of type int64 is not serializable #19

Closed philipwfowler closed 1 year ago

philipwfowler commented 1 year ago

Other samples fail with an empty json output file. If I try and process the attached vcf I get the following Traceback.

Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/gnomonicus", line 160, in <module>
    saveJSON(variants, mutations, effects, options.output_dir, vcfStem, resistanceCatalogue, gnomonicus.__version__, time.time()-start, reference, options.vcf_file, options.genome_object, options.catalogue_file)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/gnomonicus/gnomonicus_lib.py", line 962, in saveJSON
    f.write(json.dumps({'meta': meta, 'data': data}, indent=2))
  File "/usr/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/lib/python3.10/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type int64 is not JSON serializable

site.10.subj.YA00099906.lab.YA00099906.iso.1.v0.12.4.per_sample.vcf.zip minor_alleles.txt

JeremyWesthead commented 1 year ago

This was the result of some weird typing by pandas. As it requires consistent data typing, for fields such as codon_idx which weren't always not None, it reverted to using numpy.int64. Fixed by converting to pandas's own nullable integer type Int64, and filtering before joining to the JSON to convert from pandas NaN to pure Python None values.