saulpw / visidata

A terminal spreadsheet multitool for discovering and arranging data
http://visidata.org
GNU General Public License v3.0
7.79k stars 276 forks source link

[json numbers] Provide more configurability for json numbers #2475

Open frosencrantz opened 1 month ago

frosencrantz commented 1 month ago

There was a commit to the json loader that forced all numeric columns to be a float. Before this change, the type was based on how the json parser module would parse a numeric type:

https://github.com/saulpw/visidata/commit/613bf23505ce0cfb8ed6104a8d0d31a86146cc8b. "[json] deduce numeric column as float, not int https://github.com/saulpw/visidata/issues/2131 https://github.com/saulpw/visidata/issues/1698"

This change is safer than what we had before for the reasons listed in the commit. However, I wish there was an option to allow me to throw caution to the wind. The original heuristic worked better for the json I typically view, so allowing integers to be integers worked better.

The thing I dislike now, is that for columns that should be integers, there is now the extra trailing ".0". I find this distracting. It is possible to change the type, but this adds extra keystrokes to navigate to the column and change the type, and if there is more than one column it adds more steps.

I'm not asking for a new command, but maybe a new option "json_numeric_type" and maybe the values would be:

Previously the code used the first value in the column, I guess it should be possible to consider a sample or all values in the column. But I would be happy with first value.

The help string might say something like:

midichef commented 1 month ago

I run into the same issue a lot too. So I second the usefulness of some heuristic to type columns as integer instead of float.

For example, in https://github.com/saulpw/visidata/issues/2444, when I pipe linter output to vd -f json, line numbers and column numbers are floats. It's quite a distraction.

However for this use case, I'd like a heuristic that looks at more than the first row. pylint output has some fields where the first row holds nulls, and integer values are only in later rows. (examples are endLine and endColumn)

saulpw commented 1 month ago

Same here, I've actually been thinking about reverting that commit entirely. In any event it should default to doing the sensible thing, which 99% of the time is using the int type for columns in which the first row value is an int. We could put the current behavior behind safety_first, or add a new option json_int which would default to True. I don't think it's worth it to complicate it more than this.

frosencrantz commented 1 month ago

I am very glad to hear that others are feeling the same way. I like either of those choices. I have a preference for the json_int since that is more specific to the given issue, but it does sound like it could fit within safety_first.

saulpw commented 1 day ago

I realized we don't have to type the columns at all, so I removed it.

Apologies if this breaks anyone's cmdlogs.