saulpw / visidata

A terminal spreadsheet multitool for discovering and arranging data
http://visidata.org
GNU General Public License v3.0
7.92k stars 284 forks source link

Misshandling of short reads (presumably) in input files given as arguments #2048

Closed stephane-chazelas closed 1 year ago

stephane-chazelas commented 1 year ago

Description

vd -f json /path/to/fifo as in vd -f json <(some command) (here from a shell with support for Korn-style process substitution such as zsh or bash) often fails randomly.

It can be reproduced consistently with:

vd -f json <(echo '[{"foo":1,'; sleep 1; echo '"bar":2}]')

Which fails with:

JSONDecodeError: Extra data: line 1 column 6 (char 5)

Above the sleep 1 ensures the first read(2) vd does only returns [{"foo":1\n. As seen by strace:

21820 read(7, "[{\"foo\":1,\n", 8192)   = 11

When the input is read from stdin, as in:

(echo '[{"foo":1,'; sleep 1; echo '"bar":2}]') | vd -f json

The problem does not occur.

Strack trace obtained with ^E:

 Traceback (most recent call last):                                                                         ║
   File "/usr/lib/python3/dist-packages/visidata/loaders/json.py", line 29, in iterload                     ║
     ret = json.loads(L, object_hook=AttrDict)                                                              ║
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                              ║
   File "/usr/lib/python3.11/json/__init__.py", line 359, in loads                                          ║
     return cls(**kw).decode(s)                                                                             ║
            ^^^^^^^^^^^^^^^^^^^                                                                             ║
   File "/usr/lib/python3.11/json/decoder.py", line 337, in decode                                          ║
     obj, end = self.raw_decode(s, idx=_w(s, 0).end())                                                      ║
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                      ║
   File "/usr/lib/python3.11/json/decoder.py", line 353, in raw_decode                                      ║
     obj, end = self.scan_once(s, idx)                                                                      ║
                ^^^^^^^^^^^^^^^^^^^^^^                                                                      ║
 json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 11) ║
                                                                                                            ║
 During handling of the above exception, another exception occurred:                                        ║
                                                                                                            ║
 Traceback (most recent call last):                                                                         ║
   File "/usr/lib/python3/dist-packages/visidata/threads.py", line 198, in _toplevelTryFunc                 ║
     t.status = func(*args, **kwargs)                                                                       ║
                ^^^^^^^^^^^^^^^^^^^^^                                                                       ║
   File "/usr/lib/python3/dist-packages/visidata/pyobj.py", line 26, in reload                              ║
     for r in self.iterload():                                                                              ║
   File "/usr/lib/python3/dist-packages/visidata/loaders/json.py", line 41, in iterload                     ║
     ret = json.load(fp)                                                                                    ║
           ^^^^^^^^^^^^^                                                                                    ║
   File "/usr/lib/python3.11/json/__init__.py", line 293, in load                                           ║
     return loads(fp.read(),                                                                                ║
            ^^^^^^^^^^^^^^^^                                                                                ║
   File "/usr/lib/python3.11/json/__init__.py", line 346, in loads                                          ║
     return _default_decoder.decode(s)                                                                      ║
            ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                      ║
   File "/usr/lib/python3.11/json/decoder.py", line 340, in decode                                          ║
     raise JSONDecodeError("Extra data", s, end)                                                            ║
 json.decoder.JSONDecodeError: Extra data: line 1 column 6 (char 5)                                         ║

Additional context

$ vd -f json <(echo '[{"foo":1,'; sleep 1; echo '"bar":2}]')
$
$ vd --version
saul.pw/VisiData v2.11
$ python3 --version
Python 3.11.5
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux trixie/sid
Release:        n/a
Codename:       trixie
stephane-chazelas commented 1 year ago

Note that #2047 and #2048 were found while working on https://unix.stackexchange.com/questions/757832/how-to-process-json-with-strings-containing-invalid-utf-8/758016#758016

vd -f json <(cmd) in place of cmd | vd -f json is a way to work around #2047 and cmd | vd -f json in place of vd -f json <(cmd) is a way to work around #2048 so it's hard to work around both.

One can use zsh's =(...) form of command substitution (vd -f json =(cmd)) but that's not ideal as cmd and vd no longer run concurrently and the whole output needs to be stored on the filesystem.

saulpw commented 1 year ago

Hi @stephane-chazelas, thanks for the detailed report. This does repro in v2.11.1 (the latest release) but doesn't on the develop branch, likely fixed with #1955. Please let us know if you have similar issues that repro on develop!