kindly / flatterer

Opinionated JSON to CSV/XLSX/SQLITE/PARQUET converter. Flattens JSON fast.
https://flatterer.opendata.coop
MIT License
180 stars 7 forks source link

Multithreaded: assertion failed: self.is_char_boundary(new_len) #59

Closed jpmckinney closed 7 months ago

jpmckinney commented 7 months ago

Possibly related to #58

To reproduce (note that threads is 8). I'll try to upload a file:

env RUST_BACKTRACE=full flatterer --nocsv --xlsx --ndjson --force --threads 8 2023.jsonl.gz t
$ pip list | grep flatterer
0.19.13

Output:

[2024-02-07T17:29:27.785Z INFO ] Merging results
[2024-02-07T17:29:29.135Z INFO ] Writing merged xlsx file
[2024-02-07T17:29:29.150Z WARN ] WARNING: Cell larger than 32767 chararcters which is too large for XLSX format. The cell will be truncated, so some data will be missing.
[2024-02-07T17:29:29.168Z WARN ] WARNING: Cell larger than 32767 chararcters which is too large for XLSX format. The cell will be truncated, so some data will be missing.
[2024-02-07T17:29:29.195Z WARN ] WARNING: Cell larger than 32767 chararcters which is too large for XLSX format. The cell will be truncated, so some data will be missing.
[2024-02-07T17:29:29.211Z WARN ] WARNING: Cell larger than 32767 chararcters which is too large for XLSX format. The cell will be truncated, so some data will be missing.
[2024-02-07T17:29:29.214Z WARN ] WARNING: Cell larger than 32767 chararcters which is too large for XLSX format. The cell will be truncated, so some data will be missing.
[2024-02-07T17:29:29.224Z WARN ] WARNING: Cell larger than 32767 chararcters which is too large for XLSX format. The cell will be truncated, so some data will be missing.
[2024-02-07T17:29:29.269Z WARN ] WARNING: Cell larger than 32767 chararcters which is too large for XLSX format. The cell will be truncated, so some data will be missing.
[2024-02-07T17:29:29.273Z WARN ] WARNING: Cell larger than 32767 chararcters which is too large for XLSX format. The cell will be truncated, so some data will be missing.
[2024-02-07T17:29:29.277Z WARN ] WARNING: Cell larger than 32767 chararcters which is too large for XLSX format. The cell will be truncated, so some data will be missing.
thread 'tokio-runtime-worker' panicked at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/string.rs:1396:13:
assertion failed: self.is_char_boundary(new_len)
stack backtrace:
   0:        0x11f50d6c4 - _bz_internal_error
   1:        0x11f530fd8 - _bz_internal_error
   2:        0x11f50998c - _bz_internal_error
   3:        0x11f50d500 - _bz_internal_error
   4:        0x11f50ea90 - _bz_internal_error
   5:        0x11f50e7d8 - _bz_internal_error
   6:        0x11f50eeb8 - _bz_internal_error
   7:        0x11f50ed94 - _bz_internal_error
   8:        0x11f50db2c - _bz_internal_error
   9:        0x11f50eb54 - _bz_internal_error
  10:        0x11f6ced7c - _bz_internal_error
  11:        0x11f6cedf0 - _bz_internal_error
  12:        0x11e0b05b8 - _BrotliDecoderVersion
  13:        0x11d9b84b8 - _PyInit_flatterer
  14:        0x11d9e6eec - _PyInit_flatterer
  15:        0x11d9d759c - _PyInit_flatterer
  16:        0x11d9d89b0 - _PyInit_flatterer
  17:        0x11f459258 - _bz_internal_error
  18:        0x11f459c7c - _bz_internal_error
  19:        0x11f452da4 - _bz_internal_error
  20:        0x11f5139f0 - _bz_internal_error
  21:        0x187887fa8 - __pthread_joiner_wake
[2024-02-07T17:29:29.566Z ERROR] task 1 panicked
Traceback (most recent call last):
  File "/bin/flatterer", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.11/site-packages/flatterer/__init__.py", line 315, in cli
    flatten(inputs,
  File "/lib/python3.11/site-packages/flatterer/__init__.py", line 141, in flatten
    flatten_rs(input, output_dir, csv, xlsx, sqlite, parquet,
RuntimeError: task 1 panicked

Caused by:
    task 1 panicked

Location:
    /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/convert/mod.rs:757:9
kindly commented 7 months ago

@jpmckinney I know what causing this. No need to send file. Will try and do a release in the next few days.

jpmckinney commented 7 months ago

test.txt

Here's a small file.

kindly commented 7 months ago

@jpmckinney this should be fixed now. Close issue if it is now working for you.

jpmckinney commented 7 months ago

It works - thank you!