contour labels - Githubissues

geocosmite commented 7 years ago

The 'contour' plot type is very useful but would be even more powerful if it included an automated facility for creating labels. Mapbox seems to incorporate a particularly nice implementation of contour labeling as illustrated here: https://www.mapbox.com/blog/satellite-map-with-contours/

Given that the two systems already apparently have some connections perhaps something could be worked out to incorporate Mapbox's approach for contour labeling into Plotly for cases where the data are not defined in terms of lat / long but instead by x and y?

etpinard commented 7 years ago

Some label work is being done at the moment for the upcoming carpet plots in https://github.com/plotly/plotly.js/pull/1239

geocosmite commented 7 years ago

The labels in the examples look fabulous.

rreusser commented 7 years ago

Thanks for the suggestion, @geocosmite! I've been debating how elaborate to get with this since a major feature of carpet plots is inequalities which require at least some sort of label. For a start, I've been considering just labeling the inequality constraints at the edge of the plot, though that leaves a bit to be desired. It might be the best option to get the feature out the door though.

As for labels that follow the contours, I spent a bit of time analyzing examples on google images trying to reverse engineer the rules by which they're placed. The carpet plots mostly just reuse the regular contour plot code so it's certainly not inconceivable to hit two birds with one contour label placement stone.

plot_contour_ex_1

A few various features/rules that are pretty easy to notice:

mapbox/google maps etc tend to label the elevations or other features like roads with labels that follow the path.
Contrast with matplotlib which (I think based on close examination) uses the tangent of the spline to place linear labels that do not follow the path
Matplotlib drops out the contour behind the label, which more or less requires evaluation of the spline parameterized by arc length. Support for the SVG method getPointAtLength looks a little questionable, which means it might be necessary to evaluate directly. It's not that bad, but it's not without a bit of effort. Alternatively a clip-path might be better/easier.
The labels are sometimes close together and sometimes far apart. But never intersect. That suggests they're maybe using a greedy algorithm where you place labels randomly wherever the curvature is below some threshold and as long as it's not too close to an existing label. (Edit: seems not like a hard threshold but maybe just biased toward flat sections with a hard cutoff above which the label is omitted)
Dissecting the curvature issue a bit more, seems like the small circle has flat-ish sections but the label is still omitted. This is the part that seems like it might still require an arc length based check. If it were curvature and tangent alone, the small circle would still have a label. It doesn't, presumably because it doesn't 'fit'.
Really short short loops don't get a label but maybe at least one label should be required for each level. Or maybe not. I feel like you could spend a lot of time developing the right rules to make the largest set of people happy.

Of course could dig up the matplotlib code and analyze how they're doing it, but I think most of the challenge is in the details of the particular implementation rather than figuring out the logic/heuristics which can always be incrementally improved.

geocosmite commented 7 years ago

Thanks very much @rreusser for sharing your ideas about approaches for labeling. From what you have written I can see that it is difficult to come up with an implementation that will satisfy all situations. Nonetheless what you've laid out sounds like a great place to start.

Over the past couple of weeks as I have been learning to use Plotly I have grown increasingly impressed by its capabilities, stability, and API design. The careful thought that shines through your comments and those of your colleagues in the forums show me why the system is as good as it is.

Something to consider in terms of future plot types for your labeling scheme is that it would be extremely useful to have a 'contour3d' plot type that supports labels.

We are interested in this plot type given that it will allow us to view a third data attribute on a 3D surface when used in conjunction with the 'z' and 'surfacecolor' attributes supported by the 'surface' plot type. I hope to get a mockup of such a plot together in the next day or two and would be glad to share it with you if you like.

alexcjohnson commented 7 years ago

Some more observations as I start to work on this:

I'm not a big fan of trying to make the text follow the curve. I think it hinders readability, and could imagine in extremely jagged cases it would make a total mess.
I like @rreusser 's clip path idea. That would also handle the case of multiple nearby contours, we wouldn't want to cut a segment out of the contour we're measuring only to have a different contour obscure the label. So to be clear, all labels would contribute to a clip-path exclusion against all contours. (side note: getPointAtLength is OK to use, we use it elsewhere, the issue flagged in the docs just amounts to changing what class it's defined on, ie to generalize it, not to remove support.)
Looking around online as well as at my collection of paper topo maps, it doesn't seem like there's a consensus even among professional mapmakers on where to place labels relative to each other.
- Some line up a whole series of labels near each other - not overlapping of course, but almost looking like the labels on a color bar.
- Others seem to want to disperse them as much as possible.
- It seems to me the latter approach is generally the best, based on what you're trying to accomplish with labels: typically you'll be focusing on one particular region of the plot and then you ask yourself "which contour is this one?" If that very contour is labeled near enough that you can follow along and clearly see it, that's great, but if not it would still be helpful if one of the neighboring contours had a label nearby that you can increment from.
I've seen a few examples where the text can be placed completely upside down. Presumably the rationale is to show which way is up (higher z), by ensuring it's always at the top of the text. I suppose there's something to say for that, if you could get viewers sufficiently trained up to make use of it, but I don't intend to do that, and may even try to bias the algorithm toward placing the labels closer to horizontal than purely on the straightest section of the curve.
At least to start, I'm only going to include the numbers, no units or other prefix / suffix. We can add that later if people want it, but in the cases I've seen it (even if it's just ' for feet) it seems horribly redundant, and particularly so in our case since we have hover labels, colorbar, plot title... all of which give you that info with less clutter.
Sometimes you'll see multiple labels on one curve. Intuitively, it seems like when it gets hard to follow from the point you're looking at to a label (potentially for a nearby contour as mentioned above) we should add another label, but not so many that it gets cluttered or you lose a significant fraction of the line length... I'm not quite sure how to quantify that, perhaps something like distance between labels (along the contour) should be some multiple of the average distance to the next contour?

rreusser commented 7 years ago

There's the possibility of using an optimizer. You could write a simple cost function like proximity to other labels and curvature of the path, then optimize. It's not too difficult to dig up a nonlinear unconstrained minimizer (I made a super simple but one here. Has been working well, but pretty swappable for a better one).

(Undocumented, but the syntax for mine is just minimize(function (a) { return cost; }, a0) where a is some state vector.)

geocosmite commented 7 years ago

I like your way of thinking Alex! I agree that having the text along a straight line path where the angle is tangent to the contour line should lead to better readability for cases where there is significant curvature over the text's length. And I'm on board with your assessment that readability will be improved by keeping text close to horizontal while also dispersing the labels over a broader area rather than clustering them. I'm also with you on the lack of a need to include units in the labels provided that there is a legend entry where the units can be shown. @rreusser's clip idea sounds great as well.

Regarding the label frequency for a given contour, I wonder if it would make sense to base this on the ratio between the line length for the contour to the x-axis or y-axis distance (whichever is longer)? Here are some examples of poorly thought through rules that you might use for inspiration:

Contour line length < 3 * text length: no label
Contour / axis length ratio <= 1.0: 1 label
Contour / axis length ratio > 1.0: number of labels = 1 + floor(cntr_ax_ratio / 2)

Using an optimizer to determine the label locations as suggested by @rreusser would be terrific. If this is too labor intensive to implement a simpler approach to consider might be (1) find reference points by determining the positions where the labels will be evenly spaced along the contour's length, and (2) for each reference point find the position that comes closest to a horizontal text orientation within some tolerance of the reference point location (say 0.1 times the contour line length). You'll also want to check to ensure that the labels don't intersect an axis boundary.

alexcjohnson commented 7 years ago

@geocosmite interesting idea to bring the axis length (or perhaps the diagonal) into the calculation, I like it. I still think the distance to the next contour will need to play a role - imagine doubling the number of contours you show: that will make it harder to follow the contours that were already present, even though those contours didn't change. But there may well be a place for both, something like the max or quadrature combination of the two results.

geocosmite commented 7 years ago

Glad you like the contour / length ratio idea Alex. Here's another couple of options to consider that might help with clutter and readability: (1) define a label interval or (2) introduce the concept of major and minor contours. For the former idea a value of 2 would mean that labels would be defined for every other contour value. For the latter idea you could place labels only on the major contours. This would also make it possible to define different line properties and intervals for the major vs. minor contours.

Here's an example I made that shows major and minor contours draped over a Plotly 3D surface plot. The labels (white text) aren't oriented very well but I hope it at least gives an idea of the concept.

alexcjohnson commented 7 years ago

Thanks @geocosmite - I've thought about major/minor contours, I agree it would be a nice feature, and coupled somewhat to the use of labels. But to keep the feature focused I'm planning to leave that out of the first stage.

Then as long as major/minor contours are in the roadmap, I'd rather not include a label interval. It would be fairly easy to add, but major/minor is more flexible, doesn't miss any capabilities of interval (that I can think of anyway - can you?) and there's at least one complication to the interval approach that major/minor avoids: choosing the right starting point. For example, if your contour interval is 10, your first contour is at 30, and you set a label interval of 5: naturally you're asking to label contours 50, 100, and 150, not 30, 80, and 130. Major/minor would automatically default to round numbers for the major values while still allowing unusual starting points, without adding any additional attributes.

geocosmite commented 7 years ago

Sounds like a good plan Alex. The easy work around in the interim is to create separate traces for the major and minor contours (that's what I did in the example above) and only use the new labelling option on the major contour. Regarding the labeling issue when using an interval perhaps you could get around the problem using something analogous to the tick0 attribute for defining the first tick mark?

plotly / plotly.js

contour labels #1395