Closed mtennekes closed 4 years ago
Hi @mtennekes, I have been struggling with this same issue recently as well. For mapview I think, I have it under acceptable control now. Acceptable meaning that in the scope of mapview I don't care too much about whether the legend maps [0, 1) or [0, 1]. Currently, and mostly for convenience, mapview treats all integer as numeric values and all character values as factors.
Tangentially, there is infrastructure in classInt to handle interval closure (intervalClosure=
). On occasion, I've found that running classInt twice, first with dataPrecision=NULL
, the default, then with style="fixed"
and non-default dataPrecision=
, or just using dataPrecision=
. tmap::tm_fill()
has the equivalent interval.closure=
argument, but I don't see dataPrecision=
.
In addition, @dieghernan has contributed a new style: "headtails"
with a vignette. I'm looking to submit to CRAN soon, to make this available.
Thanks @tim-salabim and @rsbivand.
Currently, tmap also treats integers as numeric and character as factors, but since there were a few use cases in which the data values are clearly integers, it would be good to adjust the breaks (or at least the labels) accordingly.
The interval closure is not my main concern. It is under control: the argument legend.format
contains a parameter called digits
which is similar to dataPrecision
in classInt
. Probably would have been easier for me to use dataPrecision
in the implementation. Looking forward to test this new style headtails
in tmap.
Hi @mtennekes, those are nice improvements for tmap!
1) For my use cases the new legend labels for integers are really helpful. I would prefer a additional option "as.integer" with a default value determined by the class of the variable (integer or numeric).
2) I think a named color vector would be fine for factors and numeric (or integer) variables as well. A unified approach to define a palette would be more user friendly, but I don't know if this would be too complicated for floating point numbers.
Hi @mtennekes about the integer legend: 10 years ago I would have thought "great!", now I think it is over-engineering. Does ggplot2
have this feature?
For the color ramps: stars
now adopts a vector of colors mapping one-to-one with an integer variable, starting at 1 (like levels of a factor
); https://github.com/r-spatial/stars/issues/128
Color assignment is working now. Also the colors from stars are used (I check whether there are duplicated levels and if so, apply droplevels).
library(tmap)
library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 2.4.2, PROJ 5.2.0
data(World)
# palette of named colors for a character/factor variable
tm_shape(World) + tm_polygons("income_grp",
palette = c("2. High income: nonOECD" = "red",
"3. Upper middle income" = "green",
"4. Lower middle income" = "pink",
"1. High income: OECD" = "blue",
"5. Low income" = "purple"))
# palette of named colors for a numeric variable
World$income_grp_int <- as.integer(World$income_grp)
tm_shape(World) + tm_polygons("income_grp_int", style = "cat",
palette = c("2" = "red",
"3" = "green",
"4" = "pink",
"1" = "blue",
"5" = "purple"))
# use the colors of a stars object
#getwd()
r = read_stars("pr_landcover_wimperv_10-28-08_se5.img",
RAT = "Land Cover Class", proxy = TRUE)
# downloaded from https://s3-us-west-2.amazonaws.com/mrlc/PR_landcover_wimperv_10-28-08_se5.zip
qtm(r) + tm_legend(outside = TRUE)
@mtennekes, thank you for opening this discussion.
1. Integer variables
I think it would be a nice addition to tmap, but it is not crucial.
It depends on the effort you would make to add this feature.
An as.integer
argument sounds fine.
2. Specific value to color mapping
This is, in my opinion, a way more interesting and important feature. I already started this discussion at https://github.com/mtennekes/tmap/issues/276 and at https://github.com/mtennekes/tmap/issues/388.
It would be also great to make it possible to extend the color mapping to external symbologies (see https://github.com/mtennekes/tmap/issues/65 and https://github.com/r-spatial/discuss/issues/36).
Update: The above examples look great! I have some questions about the last examples - does it drop empty levels by default? It is possible to not drop them? How can someone edit the legend there (one category does not have a name)?
Good point @Nowosad !
Hmm, why isn't there an argument to specify whether unused levels are dropped (@mtennekes?)
That specific file is crappy: I think it doesn't contain unused levels, but duplicated levels. Also the black-colored category has level ""
. It is not easy to change the legend afterwards. Much easier is to replace all the ""
values with NA
, and set colorNA = "black"
.
You can find some examples with unused levels at https://github.com/r-spatial/stars/issues/245#issuecomment-601609490.
droplevels
drops unused factor levels. I wouldn't do that automatically: if you plot time series of factor maps, at some times certain levels may not be present but you'd still want them in the legend.
I agree @edzer, but I think there should be an argument in tmap invoking droplevels
. It could be FALSE
by default.
Exactly what I'm working on: an argument drop.levels
which is by default FALSE
.
And I'll add an argument as.integer
which formats the labels as integers (so 0 to 9, 10 to 19 etc). For know, I'll only do this for style = "pretty"
and "log10_pretty"
, which should be sufficient.
Thanks for your input!
This is totally great! I provided a bit of code for reference
I'm going to disagree with Edzer about the over-engineering. I actually think the legend-integer issue is very important. As it stands, the tmap for integer literally does not make sense since you can't tell whether a given integer on the margins falls into one category or another. Really important -- and I like your solution.
I don't have an opinion on the 2nd issue beyond what has already been supplied.
library(sf)
library(tmap)
library(dplyr)
counties <- read_sf("https://cdn.jsdelivr.net/npm/us-atlas@3/counties-10m.json") %>%
filter(stringr::str_sub(id,1,2) == "36")
n <- nrow(counties)
set.seed(100)
counties <- counties %>%
mutate(
vals_int = sample(1:10, n, replace = TRUE),
vals_cont = rnorm(n)
)
tm_shape(counties) +
tm_polygons("vals_int", style = "pretty")
tm_shape(counties) +
tm_polygons("vals_cont")
That's a very nice example @zross. It illustrates another problem:
pretty(runif(100, min = 0, max = 10))
#> [1] 0 2 4 6 8 10
pretty(1L:10L)
#> [1] 0 2 4 6 8 10
When I opened this issue, I thought that changing the labels at the righthand-side of the intervals would be enough (e.g. from 0-10, 10-20 to 0-9, 10-19, etc). However, in this case it would make more sense to have 1-2, 3-4, 5-6, 7-8, 9-10 (given n=5
). So pretty
is not very useful here.
Any ideas how to tackle this problem? @rsbivand does classInt
offer a method for this?
No, pretty()
expects that x=
is a continuous variable. classIntervals(x, n=5, style="pretty", intervalClosure="right")
gives the classes, but not the break labels.
data(World)
# as.count is TRUE for integers if style = pretty, fixed, or log10_pretty
# N (natural numbers, with 0)
World$x <- sample(0:20, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x")
# N+ (natural numbers, positive)
World$x <- sample(1:20, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x")
# Z (integers)
World$x <- sample(-10:10, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x")
#> Variable(s) "x" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
# show as continuous (old way)
World$x <- sample(1:20, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x", as.count = FALSE)
# style: fixed
tm_shape(World) + tm_polygons("x", breaks = c(1, 5, 10, 20))
# scientific notation (decided to use the set notation)
tm_shape(World) + tm_polygons("x", breaks = c(0, 1, 3, 5, 10, 20),
legend.format = list(scientific = TRUE))
# style: log10pretty (continuous)
tm_shape(World) + tm_polygons("pop_est", style = "log10_pretty")
# style: log10pretty (count)
tm_shape(World) + tm_polygons("pop_est", as.count = TRUE, style = "log10_pretty")
Created on 2020-04-07 by the reprex package (v0.3.0.9001)
Thank you Martijn, both these enhancements are very helpful for me, exactly as you are implementing them!
On Sun, Apr 5, 2020 at 1:45 AM mtennekes notifications@github.com wrote:
tmap 3.0 will be released in a few days. For this version, I want to improve the variable mapping, so any feedback/tips is welcome.
There is a need for two features:
1. Integer variables
Treat a numeric variable as integer. This is needed because currently the legend labels will be 0 to 10, 10 to 20, 20 to 30, where the presumed intervals are [0, 10), [10, 20) and [10, 30], so open righthand-side except the last). When the variable is an integer, then the legend labels should be 0 to 9, 10 to 19, 20 to 29 (or 30).
I'm thinking about style = "integer" or an additional argument as.integer. The latter probably makes more sense since many break styles (current options are c("cat", "fixed", "sd", "equal", "pretty", "quantile", "kmeans", "hclust", "bclust", "fisher", "jenks", and "log10_pretty")) should handle integers slightly differently. For instance, "log10_pretty" will return 0 to 1, 1 to 10, 10 to 100 when the variable is continuous and should return 0, 1 to 9, 10 to 99 when it is an integer.
What do you think? If we go for the second option, what would be a good name for the argument? as.integer, as.continuous, as.discrete, ....?
Next question: should tmap set the default value to this argument to continuous, or should the default value be determined by whether all variable values are integers?
(see also #258 https://github.com/mtennekes/tmap/issues/258 and #399 https://github.com/mtennekes/tmap/issues/399)
2. Specific value to color mapping
Sometimes all a user (including myself) wants is to map specific data variables to specific colors. How should this be done? Keep in mind that it should work for integer and categorical data.
For categorical data, we could let the user assign a named color vector to the argument palette, where the names correspond to the levels.
How do we do this for numeric data? A color table? If so, it makes sense to add the labels in this color table as well, rather than via the labels argument. Any ideas?
(see also r-spatial/mapview#208 https://github.com/r-spatial/mapview/issues/208)
@Nowosad https://github.com/Nowosad @Robinlovelace https://github.com/Robinlovelace @sjewo https://github.com/sjewo @jannes-m https://github.com/jannes-m @tim-salabim https://github.com/tim-salabim @edzer https://github.com/edzer @rsbivand https://github.com/rsbivand @mcSamuelDataSci https://github.com/mcSamuelDataSci @zev https://github.com/zev @zross https://github.com/zross
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mtennekes/tmap/issues/406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYFE6BJYNQ72KUL7ZXVCODRLBAKHANCNFSM4MANICWA .
Wonderful!!
On Tue, Apr 7, 2020 at 11:49 AM mtennekes notifications@github.com wrote:
data(World)
as.count is TRUE for integers if style = pretty, fixed, or log10_pretty
N (natural numbers, with 0)World$x <- sample(0:20, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x")
N+ (natural numbers, positive)World$x <- sample(1:20, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x")
Z (integers)World$x <- sample(-10:10, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x")#> Variable(s) "x" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
show as continuous (old way)World$x <- sample(1:20, size = 177, replace = TRUE)
tm_shape(World) + tm_polygons("x", as.count = FALSE)
style: fixed
tm_shape(World) + tm_polygons("x", breaks = c(1, 5, 10, 20))
scientific notation (decided to use the set notation)
tm_shape(World) + tm_polygons("x", breaks = c(0, 1, 3, 5, 10, 20), legend.format = list(scientific = TRUE))
style: log10pretty (continuous)
tm_shape(World) + tm_polygons("pop_est", style = "log10_pretty")
style: log10pretty (count)
tm_shape(World) + tm_polygons("pop_est", as.count = TRUE, style = "log10_pretty")
Created on 2020-04-07 by the reprex package https://reprex.tidyverse.org (v0.3.0.9001)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mtennekes/tmap/issues/406#issuecomment-610559092, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYFE6CJHJYLLIFFSXSLZO3RLNYU7ANCNFSM4MANICWA .
Re: https://github.com/mtennekes/tmap/issues/406#issuecomment-609428252 classInt 0.4-3 with headtails
style on CRAN.
Re: #406 (comment) classInt 0.4-3 with
headtails
style on CRAN.
... and already supported by tmap
data(World)
tm_shape(World) + tm_symbols(col = "pop_est_dens",
style = "headtails", style.args = list(thr = 1))
tmap 3.0 on its way to CRAN
tmap 3.0 will be released in a few days. For this version, I want to improve the variable mapping, so any feedback/tips is welcome.
There is a need for two features:
1. Integer variables
Treat a numeric variable as integer. This is needed because currently the legend labels will be 0 to 10, 10 to 20, 20 to 30, where the presumed intervals are [0, 10), [10, 20) and [10, 30], so open righthand-side except the last). When the variable is an integer, then the legend labels should be 0 to 9, 10 to 19, 20 to 29 (or 30).
I'm thinking about
style = "integer"
or an additional argumentas.integer
. The latter probably makes more sense since many break styles (current options arec("cat", "fixed", "sd", "equal", "pretty", "quantile", "kmeans", "hclust", "bclust", "fisher", "jenks", and "log10_pretty")
) should handle integers slightly differently. For instance,"log10_pretty"
will return 0 to 1, 1 to 10, 10 to 100 when the variable is continuous and should return 0, 1 to 9, 10 to 99 when it is an integer.What do you think? If we go for the second option, what would be a good name for the argument?
as.integer
,as.continuous
,as.discrete
, ....?Next question: should tmap set the default value to this argument to continuous, or should the default value be determined by whether all variable values are integers?
(see also https://github.com/mtennekes/tmap/issues/258 and https://github.com/mtennekes/tmap/issues/399)
2. Specific value to color mapping
Sometimes all a user (including myself) wants is to map specific data variables to specific colors. How should this be done? Keep in mind that it should work for integer and categorical data.
For categorical data, we could let the user assign a named color vector to the argument
palette
, where the names correspond to the levels.How do we do this for numeric data? A color table? If so, it makes sense to add the labels in this color table as well, rather than via the
labels
argument. Any ideas?(see also https://github.com/r-spatial/mapview/issues/208)
@Nowosad @Robinlovelace @sjewo @jannes-m @tim-salabim @edzer @rsbivand @mcSamuelDataSci @zross