Closed teunbrand closed 4 years ago
Essentially do this, but as a stat:
library(ggplot2)
centroid <- aggregate(iris[, 1:4], list(Species = iris$Species), mean)
ggplot(iris, aes(x=Sepal.Width, y=Sepal.Length)) +
geom_point(aes(color=Species)) +
geom_text(data = centroid, aes(label = Species))
Hi! That sounds cool. Use case for example: https://stackoverflow.com/questions/9441436/ggplot-centered-names-on-a-map
R U working on that stat already? Shall I give it a try?
Btw, may I be nosy and ask why the package is called ggh4x?
Hey Tjebo,
Yeah I have something that kind of works already, but it still needs some more testing + documentation before I'm adding (and I've been busy with other things).
If you're interested to try this out here is my working script:
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.2
stat_funxy <-
function(mapping = NULL, data = NULL, geom = "point",
position = "identity", ..., funx = mean, funy = funx,
argx = list(), argy = list(), na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
if (!is.function(funx)) {
stop("The `funx` argument must be a function.", call. = FALSE)
}
funx <- force(funx)
if (!is.function(funy)) {
stop("The `funy` argument must be a function.", call. = FALSE)
}
funy <- force(funy)
layer(
data = data, mapping = mapping, stat = StatFunxy, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(
funx = funx, funy = funy,
argx = argx, argy = argy,
...
)
)
}
StatFunxy <- ggproto(
"StatFunxy", Stat,
required_aes = c("x", "y"),
compute_group = function(data, scales, funx, funy, argx, argy) {
data <- as.list(data)
x <- do.call(funx, c(unname(data["x"]), argx))
y <- do.call(funy, c(unname(data["y"]), argy))
other <- setdiff(names(data), c("x", "y"))
other <- lapply(data[other], head, max(length(x), length(y)))
data <- do.call(cbind.data.frame, c(other, x = x, y = y))
}
)
ggplot(iris, aes(Sepal.Width, Sepal.Length, group = Species)) +
geom_point(aes(colour = Species)) +
stat_funxy(aes(label = Species), geom = "text")
Created on 2020-07-03 by the reprex package (v0.3.0)
The package is called ggh4x because it is 1337 5P34k (leet speak) for 'grammar of graphics hacks'. Mostly because of the multiple colour scales, panel size forcing and panel-specific position axes; these are not strictly in line with core principles of grammar of graphics (but can be useful nonetheless). The term originated in the 80's and reminded me of this youtube video, so that is where the synthwavy hex logo comes from. Also this is were I dump other stuff that I sometimes find useful while plotting (but this has nothing to do with the name).
Love the video! I did not know about leet speak :) Will check the stat, maybe tomorrow or so
do.call(c(unname(...), list(args)) is very clever
I'd simplify this though, because data frames are lists already. something along the lines of
compute_group = function(data, scales, funx, funy, argx, argy) {
x <- do.call(funx, c(unname(data["x"]), argx))
y <- do.call(funy, c(unname(data["y"]), argy))
if(length(x)!=length(y)){
stop("funx and funy must result in the same length", .call = FALSE)
}
other <- head(data[setdiff(names(data), c("x", "y"))], length(x))
data <- cbind(x,y, other)
}
I cannot really think of a case where x and y coordinates should have different lengths... ? Do you see this case? So that's how I simplified good deal just taking the length of x from the others.
for the previously mentioned thread this would result in
library(ggplot2)
library(maps)
stat_funxy <- etc...
StatFunxy <- ggproto(
"StatFunxy", Stat,
required_aes = c("x", "y"),
compute_group = function(data, scales, funx, funy, argx, argy) {
x <- do.call(funx, c(unname(data["x"]), argx))
y <- do.call(funy, c(unname(data["y"]), argy))
if(length(x)!=length(y)){
stop("funx and funy must result in the same length", .call = FALSE)
}
other <- head(data[setdiff(names(data), c("x", "y"))], length(x))
data <- cbind(x,y, other)
}
)
county_df <- map_data('county') #mappings of counties by state
ny <- subset(county_df, region=="new york") #subset just for NYS
ny$county <- ny$subregion
ggplot(ny, aes(x = long, y = lat, group = group)) +
geom_polygon(colour='black', fill=NA) +
stat_funxy(aes(label = subregion), geom = "text",size=2)+
coord_map()
ggplot(ny, aes(x = long, y = lat, group = group)) +
geom_polygon(colour='black', fill=NA) +
stat_funxy(aes(label = subregion),
funx = function(x) mean(range(x)), geom = "text",size=2) +
coord_map()
Created on 2020-07-04 by the reprex package (v0.3.0)
Because I like the idea of the centroid use case, and I think it would be much used, one could consider either mean or mean(range(x)) as main methods, maybe depending on if the center looked for are like points or polygons. Just some thoughts :)
Hey Tjebo,
Yeah I'll consider putting up a new answer once I finally finish up this stat. If you want to answer that question instead feel free to do so! Here are my thoughts on some of the comments.
I cannot really think of a case where x and y coordinates should have different lengths... ? Do you see this case? So that's how I simplified good deal just taking the length of x from the others.
There is the rare case that one of the coordinates has length 1 while the other has more. Imagine you would like to draw a line over some range, you can do fun.x = range
and fun.y = mean
. This results in length 2 x
but length 1 y
, which should automatically expand the length 1 coordinate to fit the longer coordinate.
Because I like the idea of the centroid use case, and I think it would be much used, one could consider either mean or mean(range(x)) as main methods, maybe depending on if the center looked for are like points or polygons. Just some thoughts :)
Yeah you make a good point that both these cases would be convenient. I guess I could make two extra constructors that both use the ggproto but give different fun.x
/fun.y
that are called stat_midpoint()
and stat_centroid()
for these common cases.
There is the rare case that one of the coordinates has length 1 while the other has more.
I see - in this case you still need to test for the length of y to x for when the functions result in unequal lengths not including 1, I guess :) , otherwise it will throw a non-specific error that the data frame cannot be created
as for the recycling, I think this would also work for the “others” data frame / list somehow rather than using “head”, but not sure.
Should now be possible to do the following, e.g. make flower-like plots about flowers (@robinweide, go nuts!)
library(ggh4x)
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.0.2
ggplot(iris, aes(Sepal.Width, Sepal.Length,
xend = Sepal.Width, yend = Sepal.Length,
colour = Species, group = Species)) +
geom_point() +
stat_centroid(crop_other = FALSE, geom = "curve") +
stat_centroid(crop_other = FALSE, geom = "curve", curvature = -0.5) +
stat_centroid(geom = "point", colour = "gold", size = 10)
Created on 2020-07-06 by the reprex package (v0.3.0)
How is this not a thing that already exists?
fun.x
andfun.y
argument,fun.x
to the x-positions, applyfun.y
to the y-positions.fun.x = mean
andfun.y = mean
.