uchicago-cs / deepdish

Flexible HDF5 saving/loading and other data science tools from the University of Chicago
http://deepdish.io
BSD 3-Clause "New" or "Revised" License
271 stars 60 forks source link

Brief printing #25

Open gustavla opened 7 years ago

gustavla commented 7 years ago

I am thinking about writing a function that replaces print or __repr__ for inspecting large and possibly unknown variables.

The problem

You have a variable, it may be several steps of nested containers (lists of dictionaries of arrays, or what have you), and you print it to standard output. You get a deluge of output and gain little impression of the data. Numpy will abbreviate large arrays, which is good, however Python will not for its lists and dictionaries.

The solution

I want to add a printing function to deepdish, that will try to intelligently give you a summary of a variable. For instance, a list of arrays should yield only the shapes of the arrays. Lists and dictionaries should be abridged if too long. The user should have an option for maximum length of the output (it should guarantee no surprises). The default should be set so that the full output will be visible in a typical terminal.

I think this fits nicely with deepdish, since we're all about data processing and it could be really useful for ddls -i (inspect HDF5 group directly from the command line).

Thoughts:

Any input will be welcome!