pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.61k stars 1.08k forks source link

constructing nested inline reprs #4324

Open keewis opened 4 years ago

keewis commented 4 years ago

While implementing the new _repr_inline_ in xarray-contrib/pint-xarray#22, I realized that I designed that method with a single level of nesting in mind, e.g. xarray(pint(x)) or xarray(dask(x)).

From that PR: @keewis

thinking about this some more, this doesn't work for anything other than numpy.ndarray objects. For now I guess we could use the magnitude's _repr_inline_ (falling back to __repr__ if that doesn't exist) and only use format_array_flat if the magnitude is a ndarray.

However, as we nest deeper (e.g. xarray(pint(uncertainties(dask(sparse(cupy))))) – for argument's sake, let's assume that this actually makes sense) this might break or become really complicated. Does anyone have any ideas how to deal with that?

If I'm simply missing something we have that discussion here, otherwise I guess we should open a issue on xarray's issue tracker.

@jthielen

Yes, I agree that format_array_flat should probably just apply to magnitude being an ndarray.

I think a cascading series of _repr_inline_ should work for nested arrays, so long as

* the metadata of the higher nested objects is considered the priority (if not, then we're back to a fully managed solution to the likes of [dask/dask#5329](https://github.com/dask/dask/issues/5329))

* small max lengths are handled gracefully (i.e., a minimum where it is just like `Dask.Array(...)`, then `...`, then nothing)

* we're okay with the lowest arrays in large nesting chains not having any information show up in the inline repr (situation where there is not enough characters to even describe the full nesting has to be accounted for somehow)

* it can be adopted without too much complaint across the ecosystem

Assuming all this, then each layer of the nesting will reduce the max length of the inline repr string available to the layers below it, until a layer reaches a reasonable minimum where it "gives up". At least that's the natural design that I inferred from the simple _repr_inline_(max_width) API.

All that being said, it might still be good to bring up on xarray's end since this is a more general issue with inline reprs of nested duck arrays, with nothing pint-specific other than it being the motivating use-case.

How should we deal with this?

benbovy commented 4 years ago

It's tricky for the text repr, but for the html repr some sort of horizontal accordion would be nice, although I'd be concerned adding more complexity to the current repr (regarding both UI and HTML/CSS code).

Alternatively, we could "simply" reuse the dropdown sections of each variable and add one or more sections next to the attribute and data repr ones (with their own expand icon). This could give a bit more space for fancy things and this would leave the inline space for only the highest nested array object and/or values preview if available.

benbovy commented 4 years ago

Thinking more about it, I find the current repr (both text and html) already quite dense and I'm wondering if this couldn't be the opportunity to clean it a little bit.

For example, the inline array repr would be only for showing the values (if directly accessible, otherwise ...) and a very compact description of the underlying array objects (assuming that we could easily retrieve that information), e.g.,

my_var         (x)   float64   ...            [p-u-d-s-c]

For any additional data/metadata, we have collapsed sections for the html repr (toggleable using the icons). For the text repr, we could have additional display options like display_var_attributes=False, display_array_summary=False, display_data_repr=False, etc.

It might be rather opposite to what you want to achieve @keewis, but I think a cleaner repr with hidden sections (or turned-off display options) by default would make sense if 80% of Xarray users are dealing mostly with numpy arrays.

(sorry my comment is slightly off-topic).

jthielen commented 4 years ago

@benbovy These are some good suggestions about possible final products to aim towards in the implementation. Unfortunately, there are a couple problems to keep in mind:

benbovy commented 4 years ago

Good points @jthielen.

Here's another suggestion:

keewis commented 4 years ago

I'll add a few ideas, too:

keewis commented 4 years ago

I brought this up at the meeting, and if I remember correctly, the recommendation was to take a step back and solve nested duck arrays in general (e.g. by writing a design doc – a NEP?). Correct me if I'm wrong, @shoyer, but I think the hope was that after that it would be easier to design nested reprs.

jthielen commented 4 years ago

I brought this up at the meeting, and if I remember correctly, the recommendation was to take a step back and solve nested duck arrays in general (e.g. by writing a design doc – a NEP?). Correct me if I'm wrong, @shoyer, but I think the hope was that after that it would be easier to design nested reprs.

Sounds good! Would it make sense to discuss that over on https://github.com/dask/dask/issues/5329, the pre-existing issue for this larger problem of nested duck array reprs/meta?

Also, since _repr_inline_ is already implemented, I'd still like to see a basic version of https://github.com/xarray-contrib/pint-xarray/pull/22 merged in, so at least we have good results in the "easy" case of xarray > Pint > NumPy.

shoyer commented 4 years ago

I brought this up at the meeting, and if I remember correctly, the recommendation was to take a step back and solve nested duck arrays in general (e.g. by writing a design doc – a NEP?). Correct me if I'm wrong, @shoyer, but I think the hope was that after that it would be easier to design nested reprs.

I don't think that necessarily needs to be blocker for this particular effort, but I do think this is an area where summarizing high level ideas into a "design document" that clearly articulates the use-cases and suggested solutions could be a good idea.

I don't know if this NEP necessarily needs to live in NumPy, but I do think the NEP template is a nice starting point for what should be covered in such a porposal: https://numpy.org/neps/nep-template.html

keewis commented 4 years ago

see also dask/dask#6637