Open Nowosad opened 4 years ago
Yes. We have to reduce the number of datasets in a smart way, since 3x5=15 is too much in my opinion.
I think we should aim for 3 topics/applications, one for each scale. Each topic is then covered with as few datasets as possible (i.e. such that is covers our needs).
Global
Regional / country Have to look for suitable data. The only option I currently have is Dutch commuting data. It contains numbers of commuters between municipalities (400 in total), by mode of transport.
Local We can analyze an satellite image of air pollution, and use OSM vector data as reference. For instance plot main (rail)roads and important buildings like schools. Satelite images from different moments in time would also be awesome (e.g. pre, during, and post COVID).
Although it is not the focus of the book, I think it's nice to have three different hot topics, like e.g. health (global), transport (country), and climate (local).
@mtennekes 15 datasets sound like a lot, but I tried to count (in memory) datasets used in geocompr, and there we used more than 20 datasets in the first eight chapters. However, I also think that adding datasets and modifying them (e.g. adding/removing variables, changing projections, etc.) is an incremental process. We will see what is missing while writing the book and then we can add it. We just need a starting point for now.
I like the idea of three different topics a lot. It is great!
Few remarks:
@zross what do you think?
A couple of thoughts:
In my experience, coming up with an "analysis" to do makes things a bit more interesting and real world. Simply putting bubble points on a global map, I don't think, will be as compelling.
I think starting with a topic would be the way I would prefer to do it, but practically-speaking, I think we may need to pick at least one dataset by location -- picking a location with pretty much any kind of dataset we can envision. This way, if we decide we need to include a land use layer, a tree layer, a hospital layer -- whatever -- we can be confident that data would be available. NYC, London etc.
I wonder if we could come up with an unexpected place/topic. Like if we did something with Africa, instead of looking at climate or poverty or something like that we pick UNESCO heritage sites or beautiful parks or first archaeological find. I don't know. For the workshop I did at the RStudio conference, partly, I used data on burrito restaurants in San Francisco from {yelpr}. Road density near the restaurants, number of restaurants per neighborhood. That kind of thing and people enjoyed that.
Most of my own work and experience is with the US and we absolutely need to pick an less covered area also but in terms of what I know:
My own expertise is air quality. I could easily come up with air quality-relate data for any place, any resolution. I'm currently working on a project on the global burden of air quality and have tons of useful global data from the Institute for Health Metrics and Evaluation.
My own expertise is also NYC. As you might guess, NYC has a ton of amazing and interesting data. For my Datacamp course I used a census of trees which is a nice dataset.
My wife works at a famous bird laboratory and they have amazing data. This person is someone I know at that lab and he could probably help us get some interesting data for anywhere in the world.
Hi @zross, great points. How about we split the work here?:
What do you think about that?
Agree with both of you.
I imagine that the bird datasets that Zev mentioned will be very interesting. And also something completely different (for most people at least). And it is still relevant (I mean the burrito dataset would be fun for sure, but I like topics that have impact).
I will prepare the Dutch commuting data. Not sure if it will work though, since it needs a lot of data processing to turn data into a useful map. For this purpose, I've started a new (small) package to handle this kind of OD data. Maybe I can use an already processed version of the data.
Air quality data is good to have. @zross I don't have a preference for a location for local scale: NYC is fine with me!
Hi @mtennekes and @zross,
I have started working on preparing global data using world borders from NaturalEarth and additional attributes from Gapminder. You can see it at https://github.com/r-tmap/tmap-data.
Please take a look at the code at https://github.com/r-tmap/tmap-data/blob/master/R/01-prepare-world.R.
My comments and questions:
Overall, I also think that we can (and will) modify and improve datasets while writing the book, but it will be nice to have an agreed alpha version.
Best, J.
Great work!
#Puerto Rico -> USA
etc.? Good idea. We can finetune it later.tmap::World
and spdata::world
. And for energy use and CO2 emissions, I wouldn't use countries borders, but a more detailed spatial resolution that also shows metropolitan areas.st_transform
. I noticed that there is little difference with my old favourite, Eckert IV, which I used for tmap::World
:
I played around with this dataset, and created a composite indicator:
world_all2 = world_all %>%
sf::st_transform(crs = "+proj=eck4") %>%
sf::st_make_valid() %>%
mutate(demo_corr = democracy_score * 2.5 + 25 + corruption_perception_index / 2,
demo_corr_rank = rank(-demo_corr, ties.method = "min"))
tmap_options(projection = 0, basemaps = NULL) # github version of tmap needed
tm_shape(world_all2) +
tm_polygons("demo_corr", style = "cont",
popup.vars = c("democracy_score", "corruption_perception_index",
"demo_corr","demo_corr_rank"), id = "name")
Great. I have updated the code a little bit yesterday. I think it is a good starting point for the world data.
At least three different scales:
Each level should have complete set of possible spatial object types with interesting attributes:
At least one of the scales should also have some temporal variables to showcase tmap's animation capabilities.