constructing nested inline reprs

keewis commented 4 years ago

While implementing the new _repr_inline_ in xarray-contrib/pint-xarray#22, I realized that I designed that method with a single level of nesting in mind, e.g. xarray(pint(x)) or xarray(dask(x)).

From that PR: @keewis

thinking about this some more, this doesn't work for anything other than numpy.ndarray objects. For now I guess we could use the magnitude's _repr_inline_ (falling back to __repr__ if that doesn't exist) and only use format_array_flat if the magnitude is a ndarray.

However, as we nest deeper (e.g. xarray(pint(uncertainties(dask(sparse(cupy))))) – for argument's sake, let's assume that this actually makes sense) this might break or become really complicated. Does anyone have any ideas how to deal with that?

If I'm simply missing something we have that discussion here, otherwise I guess we should open a issue on xarray's issue tracker.

@jthielen

Yes, I agree that format_array_flat should probably just apply to magnitude being an ndarray.

I think a cascading series of _repr_inline_ should work for nested arrays, so long as
* the metadata of the higher nested objects is considered the priority (if not, then we're back to a fully managed solution to the likes of [dask/dask#5329](https://github.com/dask/dask/issues/5329))

* small max lengths are handled gracefully (i.e., a minimum where it is just like `Dask.Array(...)`, then `...`, then nothing)

* we're okay with the lowest arrays in large nesting chains not having any information show up in the inline repr (situation where there is not enough characters to even describe the full nesting has to be accounted for somehow)

* it can be adopted without too much complaint across the ecosystem
Assuming all this, then each layer of the nesting will reduce the max length of the inline repr string available to the layers below it, until a layer reaches a reasonable minimum where it "gives up". At least that's the natural design that I inferred from the simple _repr_inline_(max_width) API.

All that being said, it might still be good to bring up on xarray's end since this is a more general issue with inline reprs of nested duck arrays, with nothing pint-specific other than it being the motivating use-case.

How should we deal with this?

benbovy commented 4 years ago

It's tricky for the text repr, but for the html repr some sort of horizontal accordion would be nice, although I'd be concerned adding more complexity to the current repr (regarding both UI and HTML/CSS code).

Alternatively, we could "simply" reuse the dropdown sections of each variable and add one or more sections next to the attribute and data repr ones (with their own expand icon). This could give a bit more space for fancy things and this would leave the inline space for only the highest nested array object and/or values preview if available.

benbovy commented 4 years ago

Thinking more about it, I find the current repr (both text and html) already quite dense and I'm wondering if this couldn't be the opportunity to clean it a little bit.

For example, the inline array repr would be only for showing the values (if directly accessible, otherwise ...) and a very compact description of the underlying array objects (assuming that we could easily retrieve that information), e.g.,

my_var         (x)   float64   ...            [p-u-d-s-c]

For any additional data/metadata, we have collapsed sections for the html repr (toggleable using the icons). For the text repr, we could have additional display options like display_var_attributes=False, display_array_summary=False, display_data_repr=False, etc.

It might be rather opposite to what you want to achieve @keewis, but I think a cleaner repr with hidden sections (or turned-off display options) by default would make sense if 80% of Xarray users are dealing mostly with numpy arrays.

(sorry my comment is slightly off-topic).

jthielen commented 4 years ago

@benbovy These are some good suggestions about possible final products to aim towards in the implementation. Unfortunately, there are a couple problems to keep in mind:

The main impetus for custom inline text reprs was including units in the dataset overview, and so, that functionality needs to be kept available. Having a very compact description representing the duck array type would be insufficient.
Inspecting the full order of general nested duck arrays is non-trivial. Right now, every duck array that acts as a wrapper/container has its own way of referencing what it contains and combining its own reprs with what it contains. The simplest way for this to work (and how it works now) is to trust that each array layer is handling the next one below it properly without any information of the other layers beyond that. https://github.com/dask/dask/issues/5329 proposed establishing some shared way of organizing/managing all this nested duck array metadata, which almost certainly would be needed for anything more complex/robust.

benbovy commented 4 years ago

Good points @jthielen.

Here's another suggestion:

Treat as two separate problems (1) inline display of duck array metadata and (2) inline preview of array values. This would let Xarray taking care of formatting the values (using format_array_flat, which won't need to be in public API), but this would require Xarray to know whether the duck array is lazy or not.
For inline display of duck array metadata, only show the _repr_inline_ of the top-layer (do not cascade _repr_inline_ of nested duck arrays). It's the responsibility of the top-layer library (e.g., Pint) to show metadata about the layers beyond.
Dask (duck) arrays (if they could be identified as-is https://github.com/dask/dask/issues/6385?) probably deserve their own treatment in the xarray variable repr too, because xarray exposes specific API for them (e.g., .compute()) and because we cannot trust that all duck array libraries will properly handle Dask duck arrays in their inline repr. I think that just a one-character symbol of some kind would be enough (similarly to * for coordinates with an index).

keewis commented 4 years ago

I'll add a few ideas, too:

we could also add a format parameter to _repr_inline_ that if True requires the implementers to return a list of types (prepending type(self)) and a dictionary containing the metadata. xarray could then use those to construct the repr proposed in dask/dask#5329 (I guess that only works for the expanded view, though, because the collapsed view doesn't have that much horizontal space).
other than that, we could have the implementers delegate to the data's _repr_inline_ (falling back to the normal repr) and prepend their own metadata.

keewis commented 4 years ago

I brought this up at the meeting, and if I remember correctly, the recommendation was to take a step back and solve nested duck arrays in general (e.g. by writing a design doc – a NEP?). Correct me if I'm wrong, @shoyer, but I think the hope was that after that it would be easier to design nested reprs.

jthielen commented 4 years ago

I brought this up at the meeting, and if I remember correctly, the recommendation was to take a step back and solve nested duck arrays in general (e.g. by writing a design doc – a NEP?). Correct me if I'm wrong, @shoyer, but I think the hope was that after that it would be easier to design nested reprs.

Sounds good! Would it make sense to discuss that over on https://github.com/dask/dask/issues/5329, the pre-existing issue for this larger problem of nested duck array reprs/meta?

Also, since _repr_inline_ is already implemented, I'd still like to see a basic version of https://github.com/xarray-contrib/pint-xarray/pull/22 merged in, so at least we have good results in the "easy" case of xarray > Pint > NumPy.

shoyer commented 4 years ago

I brought this up at the meeting, and if I remember correctly, the recommendation was to take a step back and solve nested duck arrays in general (e.g. by writing a design doc – a NEP?). Correct me if I'm wrong, @shoyer, but I think the hope was that after that it would be easier to design nested reprs.

I don't think that necessarily needs to be blocker for this particular effort, but I do think this is an area where summarizing high level ideas into a "design document" that clearly articulates the use-cases and suggested solutions could be a good idea.

I don't know if this NEP necessarily needs to live in NumPy, but I do think the NEP template is a nice starting point for what should be covered in such a porposal: https://numpy.org/neps/nep-template.html

keewis commented 4 years ago

pydata / xarray

constructing nested inline reprs #4324