jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
58 stars 29 forks source link

GIS map #558

Open Kibria2017 opened 4 years ago

Kibria2017 commented 4 years ago

Do you have any plan to integrate GIS map (like, world map) in it?

JorisGoosen commented 4 years ago

Why would we want to integrate that?

AlexanderLyNL commented 4 years ago

For geographical analyses. Currently, this is not on the top of our priority as most of our users are in the social sciences, but it could be something for the future.

Kibria2017 commented 4 years ago

I understand your point. It will be useful for those who work on infectious disease epidemiology.

tomtomme commented 8 months ago

Someone should probably be assigned to this, since this is also relevant in social sciences :)

Example from https://orda.shef.ac.uk/articles/figure/European_map_of_Implicit_Racial_Bias/4750588?file=7790077 image

JorisGoosen commented 8 months ago

Are there any opensource backends we can use for this? Ideally integrated with qt (but then not the esri arcgis stuff cause that is proprietary)

Otherwise it might be a bit hefty to introduce? Also, would we then need to include maps of the world as is and dont we get into trouble with lets say china ifd we use the wrong map

tomtomme commented 8 months ago

Yeah this will surely be a big ticket. I worked with a postgres DB, PostGIS and QGIS during my PhD. But since that was 10 years ago, I asked Copilot for the current state of afairs - and the output is surely no halucination, since all links work and I know all of those, except the R-Stuff at the end:

Here are the four open-source alternatives to ArcGIS ordered by popularity:

As for popular R-packages that support GIS or interface with these tools:

JorisGoosen commented 8 months ago

I had a quick look at QGIS and saw that it is based on qt and even has qml gui things builtin.

This for instance looks interesting: https://github.com/qgis/QGIS/blob/master/src/quickgui/qgsquickmapcanvasmap.h

Not that Im suggesting we work on this now as it would indeed be a lot of work. But if we can possibly just reuse parts of QGIS in JASP directly we can have it be a completely interactive element in the normal GUI. And make it available in R somehow.

But it is probably is good enough to do something with qgisprocess in a separate module called QGIS or something no? That could be done entirely from R in that scenario and would require no changes to JASP probably. Although in linux we probably still need to do some magic to get it to work

tomtomme commented 8 months ago

Yes, using qgisprocess in JASP sounds about right. But one has to install QGIS before, or we have to prompt the user to install it from https://flathub.org/apps/org.qgis.qgis

JorisGoosen commented 8 months ago

well if its possible to use qgisprocessor with qgis from flathub then in fact it might not even be that much work on linux

tomtomme commented 7 months ago

I have contacted the QGIS people via: https://github.com/qgis/QGIS/issues/56342 Maybe they have more ideas on this.

tomtomme commented 6 months ago

@jansim I just saw that you are working on mapping functionality here https://github.com/jansim/rmaps Could this be a simpler way to integrate maps into jasp than via QGIS_process?

jansim commented 6 months ago

Hi @tomtomme, Cool that you stumbled upon that project, but I think it's probably not the best starting point for a GIS integration. I could recommend the tmap package as a potential starting point for plotting maps. For sth like the Choropleth example you posted above it's a great choice with both interactive (HTML+JS based) and non interactive (normals plots) maps.

Although, if you do have the spatial bounding boxes etc already from the file format you plan to load (not sure how this stuff would look in QGIS), it's also very straitforward to plot spatial data via ggplot.

tomtomme commented 6 months ago

@jansim Do you think that it is possible that jasp could import typical GIS file formats? I would not have thought that this would be (easy) implementable. Then using ggplot seems off the table, right?

I thought more into the direction of loading / streaming some openstreet-map (OSM) layers available online and then choosing some typical stats data columns (from SAV / CSV etc.) to display e.g. points on top of that via columns that need to contain coordinates. So something simple to start with.

Filling bounding boxes with different colors would be the next step. I imagine the bounding boxes streamed from OSM would be addressable e.g. via zip or country code or a standardized centrality coordinate etc. And the stats data then needs to be accompanied also by such a zip code column and then a column with a color code. So can stream the bounding boxes e.g. from OSM via tmap commands? I guess tmap or any R-package really includes spatial data.

Yeah, really long ago that I used GIS stuff, it seems like someone deleted that info from my brain and replaced it with stats stuff...

jansim commented 6 months ago

I think for certain GIS formats such as shapefiles there are good R packages out there to load them, but I actually agree with you here that loading / streaming in OSM data or tiles is probably more convenient and better matches the typical use cases people would have.

I think the main benefit tmap would have over ggplot is that it can do the loading of rendered tiles and can then plot on an actual map either interactively via HTML or generate a rendered image. To generate a choropleth with filled polygons, I'm not sure what the most convenient solution would be since I don't yet know a good source where one could download the appropriate polygons (I think bounding boxes are often only rectangles) for filling them in. JASP could already include e.g. a low res world map with at least countries on there, but to support zip codes or sth would be hard. You can download / stream some of this information via the overpass API, but that can be quite slow, has variable formats and could put quite a bit of load on their free infrastructure 🤔

tomtomme commented 6 months ago

Thanks to this discussion it seems, that integration of "qgis_process" might be overkill. Probably easier options:

Example with ggplot + cartogram from https://kieranhealy.org/blog/archives/2023/12/06/dorling-cartograms/

Code

## Dorling Cartogram example with US Census data
## Requires you sign up for a free Census API key
## https://api.census.gov/data/key_signup.html
##

## Required packages
library(tidyverse)
library(tidycensus)
library(sf)
library(cartogram)
library(colorspace)

## Setup
options(tigris_use_cache = TRUE)

## Do this
census_api_key("YOUR API KEY HERE")
## or, to install in your .Rprofile follow the instructions at
## https://walker-data.com/tidycensus/reference/census_api_key.html

pop_names <- tribble(
  ~varname, ~clean,
  "B01003_001", "pop",
  "B01001B_001", "black",
  "B01001A_001", "white",
  "B01001H_001", "nh_white",
  "B01001I_001", "hispanic",
  "B01001D_001", "asian"
)

## Get the data
fips_pop <- get_acs(geography = "county",
                    variables = pop_names$varname,
                    cache_table = TRUE) |>
  left_join(pop_names, join_by(variable == varname)) |> 
  mutate(variable = clean) |> 
  select(-clean, -moe) |>
  pivot_wider(names_from = variable, values_from = estimate) |>
  rename(fips = GEOID, name = NAME) |>
  mutate(prop_pop = pop/sum(pop),
         prop_black = black/pop,
         prop_hisp = hispanic/pop,
         prop_white = white/pop,
         prop_nhwhite = nh_white/pop,
         prop_asian = asian/pop)

fips_map <- get_acs(geography = "county",
                    variables = "B01001_001",
                    geometry = TRUE,
                    shift_geo = FALSE,
                    cache_table = TRUE) |>
  select(GEOID, NAME, geometry) |>
  rename(fips = GEOID, name = NAME)

pop_cat_labels <- c("<5", as.character(seq(10, 95, 5)), "100")

counties_sf <- fips_map |>
  left_join(fips_pop, by = c("fips", "name")) |>
  mutate(black_disc = cut(prop_black*100,
                          breaks = seq(0, 100, 5),
                          labels = pop_cat_labels,
                          ordered_result = TRUE),
         hisp_disc = cut(prop_hisp*100,
                         breaks = seq(0, 100, 5),
                         labels = pop_cat_labels,
                         ordered_result = TRUE),
         nhwhite_disc = cut(prop_nhwhite*100,
                            breaks = seq(0, 100, 5),
                            labels = pop_cat_labels,
                            ordered_result = TRUE),
         asian_disc = cut(prop_asian*100,
                          breaks = seq(0, 100, 5),
                          labels = pop_cat_labels,
                          ordered_result = TRUE)) |>
  sf::st_transform(crs = 2163)

## Now we have
counties_sf

## Create the circle-packed version
## Be patient
county_dorling <- cartogram_dorling(x = counties_sf,
                                    weight = "prop_pop",
                                    k = 0.2, itermax = 100)

## Now draw the maps

## Black
out_black <- county_dorling |>
  filter(!str_detect(name, "Alaska|Hawaii|Puerto|Guam")) |>
  ggplot(aes(fill = black_disc)) +
  geom_sf(color = "grey30", size = 0.1) +
  coord_sf(crs = 2163, datum = NA) +
  scale_fill_discrete_sequential(palette = "YlOrBr",
                                 na.translate=FALSE) +
  guides(fill = guide_legend(title.position = "top",
                             label.position = "bottom",
                             nrow = 1)) +
  labs(
    subtitle = "Bubble size corresponds to County Population",
    caption = "Graph: @kjhealy. Source: Census Bureau / American Community Survey",
    fill = "Percent Black by County") +
  theme(legend.position = "top",
        legend.spacing.x = unit(0, "cm"),
        legend.title = element_text(size = rel(1.5), face = "bold"),
        legend.text = element_text(size = rel(0.7)),
        plot.title = element_text(size = rel(1.4), hjust = 0.15))

ggsave("figures/dorling-bl.png", out_black, height = 10, width = 12)

## Hispanic
out_hispanic <- county_dorling |>
  filter(!str_detect(name, "Alaska|Hawaii|Puerto|Guam")) |>
  ggplot(aes(fill = hisp_disc)) +
  geom_sf(color = "grey30", size = 0.1) +
  coord_sf(crs = 2163, datum = NA) +
  scale_fill_discrete_sequential(palette = "SunsetDark", na.translate=FALSE) +
  guides(fill = guide_legend(title.position = "top",
                             label.position = "bottom",
                             nrow = 1,
  )) +
  labs(fill = "Percent Hispanic by County",
       subtitle = "Bubble size corresponds to County Population",
       caption = "Graph: @kjhealy. Source: Census Bureau / American Community Survey") +
  theme(legend.position = "top",
        legend.spacing.x = unit(0, "cm"),
        legend.title = element_text(size = rel(1.5), face = "bold"),
        legend.text = element_text(size = rel(0.7)),
        plot.title = element_text(size = rel(1.4), hjust = 0.15))

ggsave("figures/dorling-hs.png", out_hispanic, height = 10, width = 12)

## NH White
out_white <- county_dorling |>
  filter(!str_detect(name, "Alaska|Hawaii|Puerto|Guam")) |>
  ggplot(aes(fill = nhwhite_disc)) +
  geom_sf(color = "grey30", size = 0.1) +
  coord_sf(crs = 2163, datum = NA) +
  scale_fill_discrete_sequential(palette = "BluYl", na.translate=FALSE) +
  guides(fill = guide_legend(title.position = "top",
                             label.position = "bottom",
                             nrow = 1,
  )) +
  labs(fill = "Percent Non-Hispanic White by County",
       subtitle = "Bubble size corresponds to County Population",
       caption = "Graph: @kjhealy. Source: Census Bureau / American Community Survey") +
  theme(legend.position = "top",
        legend.spacing.x = unit(0, "cm"),
        legend.title = element_text(size = rel(1.5), face = "bold"),
        legend.text = element_text(size = rel(0.7)),
        plot.title = element_text(size = rel(1.4), hjust = 0.15))

ggsave("figures/dorling-nhw.png", out_white, height = 10, width = 12)

## Asian
out_asian <- county_dorling |>
  filter(!str_detect(name, "Alaska|Hawaii|Puerto|Guam")) |>
  ggplot(aes(fill = asian_disc)) +
  geom_sf(color = "grey30", size = 0.1) +
  coord_sf(crs = 2163, datum = NA) +
  scale_fill_discrete_sequential(palette = "Purple-Ora", na.translate=FALSE) +
  guides(fill = guide_legend(title.position = "top",
                             label.position = "bottom",
                             nrow = 1,
  )) +
  labs(fill = "Percent Asian by County",
       subtitle = "Bubble size corresponds to County Population",
       caption = "Graph: @kjhealy. Source: Census Bureau / American Community Survey") +
  theme(legend.position = "top",
        legend.spacing.x = unit(0, "cm"),
        legend.title = element_text(size = rel(1.5), face = "bold"),
        legend.text = element_text(size = rel(0.7)),
        plot.title = element_text(size = rel(1.4), hjust = 0.15))

ggsave("figures/dorling-asian.png", out_asian, height = 10, width = 12)

Map image