sjewo / cartogram

r package for cartogram creation
https://sjewo.github.io/cartogram/
148 stars 15 forks source link

No scaling, or incorrect scaling using cartogram #16

Closed flashton2003 closed 6 years ago

flashton2003 commented 6 years ago

Thanks for making the cartogram package available. I'm trying to make a cartogram of TB burden in Africa (the whole world is my goal, Africa just a stepping stone) and the cartogram is giving output which does not scale by the burden. The range of TB burden is about 438000 to 0. The output for this, run with 500 iterations is below, the problem is that there is no scaling of the country area by TB burden:

rplot01

Here is the code I used to generate this:

library(readr)
library(cartogram)
library(tmap)
library(maptools)

data(wrld_simpl)

tb_burden <- read_delim("~/Dropbox/mtb/tb_burden_stats/tb_cartogram/tb_burden.tsv", "\t", escape_double = FALSE, trim_ws = TRUE)
wrld_tb <- merge(wrld_simpl, tb_burden, by.x = "NAME", by.y = 'Rcountry')
afr_tb <- wrld_tb[wrld_tb$REGION == 2, ]
afr_tb <- spTransform(afr_tb, CRS("+init=epsg:3395"))
afr_tb_cont <- cartogram_cont(afr_tb, "TB_cases", itermax = 500)
#afr_tb_cart <- cartogram_cont(afr_tb, "POP2005", itermax = 1)
tm_shape(afr_tb_cont) + tm_polygons("TB_cases", style = "jenks") + tm_layout(frame = FALSE)

And a link to the tb_burden.tsv for replication.

I have also tried dividing the TB burden by 1000, with the same results. I log transformed the TB burden, which gave this result, which doesn't make sense because South Africa and Nigeria should be largest (highest burden):

untitled picture

Any help would be greatly appreciated.

Nowosad commented 6 years ago

@sjewo It looks that the issue is caused by zeros in the dataset. It works well when all the values are > 0:

afr_tb$TB_cases <- afr_tb$TB_cases + 1
afr_tb_cont <- cartogram_cont(afr_tb, "TB_cases", itermax = 10)
tm_shape(afr_tb_cont) + tm_polygons("TB_cases", style = "cont") + tm_layout(frame = FALSE)

flashton2003 commented 6 years ago

Thank you!

sjewo commented 6 years ago

It is difficult to calculate the distortion for very small or large values. The default strategy is to raise the lowest and shrink the largest values. The Parameter threshold defines a quantile and by default all values below the 5th percentile are adjusted.

Your data has a lot of zeors, so the 5th percentile is zero too. Just raise the threshold to get a better adjustment:

afr_tb_cont <- cartogram_cont(afr_tb, "TB_cases", itermax = 10, threshold=0.1)

I'll add a warning in the next release, to print a message if the adjusted values are still zero.

flashton2003 commented 6 years ago

This is the first time in my experience that countries having no TB was a bad thing :-)

Thanks for the explanation.

briatte commented 5 years ago

I have the same issue. Playing with your README example, except using the Americas instead of Africa:

library(cartogram)
library(tmap)
library(maptools)

data(wrld_simpl)
table(wrld_simpl$REGION)

x <- wrld_simpl[wrld_simpl$REGION == 19 & wrld_simpl$POP2005 > 0, ]
x <- spTransform(x, CRS("+init=epsg:3395"))

x_cont <- cartogram_cont(x, "POP2005", itermax = 5)

tm_shape(x_cont) + tm_polygons("POP2005", style = "jenks") +
  tm_layout(frame = FALSE)

Result:

untitled

Could it be due to the bounding box?

sjewo commented 5 years ago

Your example is a tough problem for the algorithm: Canada and Greenland have the largest areas and a rather small population. The great number of fjords and islands aren't helpful either...

After 100 iterations the scaling looks a little bit better: ameri

Maybe you could try another distortion algorithm, like this ArCGIS plugin: https://www.arcgis.com/home/item.html?id=d348614c97264ae19b0311019a5f2276

briatte commented 5 years ago

@sjewo Thanks for the instructive explanation, and suggestion to increase the iterations.

Your algorithm works well, I'll just be more patient next time, and work with a large number of iterations. It's also helpful that you offer two stopping rules (error size and max # iterations).