matplotlib / data-prototype

https://matplotlib.org/data-prototype
BSD 3-Clause "New" or "Revised" License
5 stars 4 forks source link

DataContainer.query() missing unit conversion? #7

Open l-johnston opened 1 year ago

l-johnston commented 1 year ago

Should DataContainer.query() include unit conversion? Or where will this happen? Ideally, I would like DataContainer to hold the data in its original form and then convert, slice, etc. upon query. Subsequent queries would start from the original data. Ideally, the enduser will be able to also inspect this DataContainer to verify its correctness. Today, that's impossible for some plotting functions and hard to do for others.

tacaswell commented 1 year ago

The current proposal is that the unit conversion goes https://github.com/matplotlib/data-prototype/blob/609fe4995f995a69a21d017ad4e34da1aefc1bff/data_prototype/wrappers.py#L128-L135

The logic is that the arists know what Axes they are in (currently stateful, but at a minimum this can be passed in at draw time) however Containers do not (and I argue should not) know what Artist know about them. We want to be able to share the Container (or derived container) objects between multiple Artists, either they have to know about all of the Artists or none. "none" seems like the simpler option to me!

Another reason to leave the unit conversion in the Artist is that which conversion we need to use depends on the context (x-like data needs to be run through the xaxis converter, y-like data needs to be run through the yaxis converter, and in principle we want to support units on all the things so each field may need a special converter) and the part of the code that should be responsible for maintaining that mapping is the Artist.

"relative" data is surprisingly tricky as you want to do the compute pre-unit conversion but if you want to interleave Artist computation logic in the middle of the query-convert chain we rapidly get back to the current situation where we have to get the unit conversion correct in N places instead of 1. I think this pushes us towards always requiring the results of query to be "absolute" (that is rectangle corners instead of center + width and upper and lower limits rather than data value + delta), but that seems OK as we are already expecting to have a number of other much more compute heavy (histograms, contours, stream plot, ....).

l-johnston commented 1 year ago

Will DataContainer.query() invoke the "array interface" on the data object and return a numpy.ndarray? If so, how will the Artist be able to perform a unit conversion?

tacaswell commented 1 year ago

The query method currently returns a mapping of strings -> arrays. I think it would be reasonable to loosen that a bit to "array-ish" and defer getting all the way to an array to the conversion step.

At the end of the day, we do not need an actual array until we get to the Render methods so there is a lot of phase phase to play with as to the best way to organize and factor this processing chain. There is going to be a trade off of too many hook points makes everything too verbose to work with / understand and too few hook points where we end up like we are today.

l-johnston commented 1 year ago

For the data-as-custom-class scenario, what does the DataContainer store and what will be the result of a query?

class Qlist(list):
    """A list of Quantity objects"""

class Quantity(float):
    """A simple Quantity"""

matplotlib.pyplot.plot(Qlist([Quantity(1), Quantity(2), Quantity(3)]))
tacaswell commented 1 year ago

We are not at the level of sorting out the application level API yet.

The data containers stores the data how ever it wants, the query returns a mapping of str -> what-the-unit-converter-eats , the unit converted goes from it's input -> what-the-nu-transform-eat and the nus go from their input -> what the renderer methods need (e.g. unitless numpy arrays of floats for position data, paths for shape data, RGB(A) values for colors, ....).