holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.7k stars 404 forks source link

Points, Contours and Polygon are one undefined concept #143

Closed jlstevens closed 5 years ago

jlstevens commented 9 years ago

Explaining the difference between Points and Scatter has always been tricky. Partially, this is because Scatter is in chart.py when it is conceptually more similar to Contours in path.py. I think both these classes are part of some difficult-to-name concept and therefore both in the wrong file with the wrong base class.

Essentially, here is how I think of it:

The point of elements in ConceptX is that they convey information by their visual appearance (the positions of the points, contours or polygons) but may also have some value associated with each of these visual elements (e.g the iso-levels of contours). The best (terrible!) semantic name I can come up with OptionallyValuedVisualElements.

This links to issue #102 discussing potential improvements to Contours.

jbednar commented 9 years ago

"Locations"?

Where does Scatter fit into your bullet list above?

jlstevens commented 9 years ago

Scatter is a chart. It is the chart equivalent to Points (which then belongs to the concept we are discussing).

Almost any name is better than the ones I suggested but I don't think Locations is quite right. I think that Regions might be slightly closer but that isn't much better.

How about Markers? I think the idea of a marker is more specific than an annotation (e.g I wouldn't consider text to be a marker). I would suggest ValueMarker except there doesn't have to be a value associated. Maybe DataMarker suggests there can be important data associated with the elements but I do realize that in general a prefix of 'data' is pretty useless!

That said, I do quite like the sound of DataMarker: even if the prefix is useless, it makes it clear that we are trying to refer to a specific concept...

Just to summarize, we will have the regular elements (charts, rasters, chart3d, tables etc) then annotations of which the Path elements are a subclass. Then the idea of DataMarker is intermediate between annotations and the regular elements.

jbednar commented 9 years ago

"Regions" implies a 2D enclosed area, to me. GraphicalElement? Marker seems ok. DataMarker is ok too, though as you say Data means nearly nothing.

philippjfr commented 6 years ago

Is this something we still want to consider look at? From my perspective elements can be conceptually grouped as follows

Defined as plot where independent variables map to x-axis and dependent variables to y-axis.

Binned data in 1D or 2D.

2D Gridded data where each coordinate maps to pixel center

Network graphs showing connectivity between different nodes.

Kernel density estimates of 1D and 2D data

Pure annotations useful for highlighting some aspect of a plot.

2D locations where x- and y-axis represent the same quantity/space

jbednar commented 6 years ago

Undefined concept == Spatially situated, i.e. treating the 2D plot as a 2D space. SpatiallySituated is not a good name, though.

I'm not sure what your criterion is for separating the Annotations from this category; how are they different? Text, HLine, VLine, and Arrow are all situated in the 2D space of the plot, aren't they?

Also, are the x and y axes really required to be commensurate? Seems like Points, Bounds, and Box at least are well defined whether or not x has the same scale as y, making them a special subset of this concept.

philippjfr commented 6 years ago

I agree with all that, my criteria for separating annotations was simply that Contours/Polygons/Points/Path are generally not merely annotations but actual data.

Also, are the x and y axes really required to be commensurate?

Not really, that's just how I think about these elements. I think generally when they are not commensurate you're probably using the element as an annotation rather than to represent actual data (but of course there will be exceptions).

jbednar commented 6 years ago

I think generally when they are not commensurate you're probably using the element as an annotation rather than to represent actual data (but of course there will be exceptions).

That's probably true of Bounds and Box, but Points seems useful for representing locations in any 2D space. E.g. a plot of market capitalization vs. number of employees, with color representing something else, e.g. which stock exchange the company is traded on --- seems very much a Points plot rather than Scatter (as market capitalization isn't a function of number of employees, or vice versa). But also not an annotation, just a plot in a 2D space where x and y aren't commensurate.

philippjfr commented 6 years ago

seems very much a Points plot rather than Scatter (as market capitalization isn't a function of number of employees, or vice versa)

I'd argue you should use Scatter for this. You're never sure if one is a variable of the other but Scatter let's you ask the question "do the numbers of employees have a relationship to the market cap or vice versa?".

jbednar commented 6 years ago

A matter of preference, I guess; if I'm trying to see the pattern of colored dots, I don't want to have to pick one or the other dimension as being nominally the independent one; they are both independent to me...

jlstevens commented 6 years ago

I think the crux of it is figuring out cause and effect - you often don't know about this relationship and two quantities may have nothing to do with each other. This makes it hard to know which should be the kdim and which should be the vdim when considering two dimensions of different types.

When two dimensions have the same type, you can have rotations in that space as it is 'uniform' i.e you get to choose your basis. In this case, it makes sense for there to be two kdims.

Thinking about it this way, in the ideal case would be that you use Points for two dimensions of the same type (where you can rotate basis) and you use Scatter otherwise, as you know which dimension is the kdim and which is the vdim.

The problem then is deciding on this relation which is not obvious for uncorrelated, independent quantities, though the question then is why would you want to plot scatters for uncorrelated quantities? I suppose you might be simply searching for correlations...

So on balance I think I do agree with Philipp's assessment.

I don't want to have to pick one or the other dimension as being nominally the independent one...

Then the way to think about it is by picking a kdim and a vdim for your Scatter, you are making a hypothesis about a relationship between two quantities. That hypothesis that there is a meaningful relationship to visualize may be false of course...

jbednar commented 6 years ago

If I'm mainly interested in what I'm plotting as color or size, I'm not necessarily even interested in making a hypothesis about how the x and y dimensions relate to each other; I'm interested in how they relate to the z dimension(s). By using Points for such a plot I want to indicate that I am making no such hypothesis.

jlstevens commented 6 years ago

I agree, that is exactly where Points is useful because the x and y dimensions are interchangeable i.e you can choose any reference frame. For instance, an image of the night sky has no 'correct' reference frame, if you rotate the image by an arbitrary angle, you are still looking at the same thing as the chosen reference frame is just convention.

I suspect you might be thinking of the case when x and y are different dimension types, in which case I think Scatter is more appropriate than Points:Scatter is for plotting one thing against the other.

philippjfr commented 5 years ago

The concept and baseclass they share is now called Geometry.

github-actions[bot] commented 2 weeks ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.