Closed kanitw closed 7 years ago
Maybe we shouldn't bother for the default but for voyager/polestar use scientific notation to avoid super long labels.
OK, I just did some basic testing with Vega 3's default settings for linear, log and sqrt/pow scales. All have pretty reasonable defaults as-is (with no special formatting directive applied), but could be further refined in how "extreme" values are handled.
linear
, sqrt
, pow
- These all use the same default setting, which generates a linear set of ticks with precision automatically configured by the scale domain. Problems arise when the numbers get very big or very small: the labels grow exceeding long with leading or trailing zeros. These labels switch to exponential notation once values are greater than or equal to 1e+21
.log
- This uses a special formatter that determines precision in a non-linear fashion. It automatically switches to using exponential labels once the digit precision exceeds a certain threshold: 1e-7
for small numbers, 1e+12
for large numbers.Questions:
linear
, etc should we institute similar policies as we use for log
(or vice versa) and switch to exponential (scientific) notation for large/small numbers at the same threshold values?1,000,000
and 1e+12
might be visible simultaneously. Would we want the formatters to switch such that all labels use the same style? I could see arguments in both directions (stability vs. consistency), so would like to get other folks' opinions here.cc @kanitw @domoritz @arvind
Also, I should note that the formatting for linear
, etc is currently done using d3's number formatting. So these defaults are inherited. Changes/fixes of a general nature might require edits / pull requests to the d3-format repo, or additional wrapper code in Vega (as we currently do for log scales).
Here are my quick opinions from thinking without experimenting:
What cutoffs should we use for switching notation? For example, the log formatter currently uses a default precision of 12 digits.
I think the cutoff should be at least 9 as I think a million or hundred million (e.g.,100,000,000
) are still readable. A billion or hundred billion (e.g.,100,000,000,000
) may also be still readable so I guess the reasonable cut-off would be around 9-12. (21 is definitely too high.)
For linear, etc should we institute similar policies as we use for log (or vice versa) and switch to exponential (scientific) notation for large/small numbers at the same threshold values?
This sounds reasonable. (Is there any reason that concern you about doing this?)
One other issue is that the formatters currently interleave formats: for example both 1,000,000 and 1e+12 might be visible simultaneously. Would we want the formatters to switch such that all labels use the same style? I could see arguments in both directions (stability vs. consistency), so would like to get other folks' opinions here.
Assuming stability means consistency among different plots that contain numbers with different levels of magnitude, I think consistency within the same plot is more important. For a plot with a non-linear scale, having both 1,000,000
and 1e+12
in the same plot would be a bit weird.
If users want consistent format among different plots, they can still explicitly set the format
anyway.
Also, I should note that the formatting for linear, etc is currently done using d3's number formatting. So these defaults are inherited. Changes/fixes of a general nature might require edits / pull requests to the d3-format repo, or additional wrapper code in Vega (as we currently do for log scales).
A related question for vega-tooltip
, if we have to wrap d3-format
anyway I wonder if we can have a single format
method that Vega-Tooltip can call. (Currently, it calls d3-format directly, so the log format is already inconsistent with Vega.)
cc: @sirahd
I agree that smarter formatting in Vega is the right approach going forward.
Two questions:
1) How do you specify "no formatting" in vega formatting expressions?
2) @kanitw suggested starting with SI units at 10e9 or higher. This can be problematic with x-axis labels:
Should we have special cases for different axes? Should the cutoff be customizable?
@domoritz Great points. I believe this x-axis issue partly motivated why we used SI format in the past.
Thanks all for the input. Responding to @domoritz:
format
parameter is included in the axis definition ("format": null
should work similarly).Agree with all of the above. I guess instead of outputting "format": null
, we can just not specify a format at all. For expressions, we can generate expr: 'format(datum["foo"], null)
.
Hmm, interesting point regarding expressions. Right now that uses a separate mechanism that directly calls d3-format and d3-time-format. Only the AxisTicks
and LegendEntries
operators in vega-encode involve specialized processing paths. Another item for me to look into...
Also, using the format
expression function lies outside the context of a particular scale, and so can not support any scale-specific formatting actions. So I'm not sure these should be included here.
Do we plan to ship a smarter formatter in Vega 3 and remove the default format from Vega-Lite 2?
I think we should. The defaults in Vega 3 are already better defaults, even though there remain additional potential improvements.
I agree. However, I'm a bit worried about cases like https://github.com/vega/vega-lite/issues/1539#issuecomment-303500863. Should we try to have a smarter default here or just rely on the users to provide a better format (e.g. s
)?
Ping @jheer -- I think we definitely should handle https://github.com/vega/vega-lite/issues/1539#issuecomment-303500863 somehow before changing the default format to undefined
.
I have added a new labelOverlap
option for axes that supports multiple strategies for automatic overlap removal. The default setting does nothing: no overlap removal is performed.
Other options:
true
or "parity"
: Reduce overlap by attempting to remove every other label, repeating this process until overlap is removed or only two labels remain. If only two labels remain, the first and last label will be shown. This setting is appropriate for linear, sqrt, and pow settings."greedy"
: Reduce overlap by sequentially scanning the labels, removing any items whose bounds overlap the last visible label. This strategy works better than "parity"
for log scales, though is not "ideal" in that it does not actually reason about which ticks are the most useful for interpreting a log scale.The Vega-Lite compiler can thus set the axis labelOverlap
property to enable automatic overlap removal if desired. These modifications have been published in Vega sub-modules and will be included (as an undocumented soft launch) in the next Vega beta release.
Note that the default "parity"
strategy could also be applied to band/point scales. However, if you switch to horizontally-oriented labels for ordinal scales I recommend using the limit
encoding channel to limit the text according to the range step size.
SI seems confuses users a lot!