worldbank / ESG_gaps_research

See draft publication here: https://worldbank.github.io/ESG_gaps_research/
2 stars 5 forks source link

Alternate CV charts #23

Open tgherzog opened 4 years ago

tgherzog commented 4 years ago

@randrescastaneda @tonyfujs I've made some progress implementing the charts we discussed by email and got as far as I could. There is python code in the repository to create the necessary data frame (although this could probably just as easily be done in R), and I've written some R/plotly code to create the charts. The latter is very rough and probably not very good since I'm not very experienced in either R or plotly, but here it is.

This code implements the first chart that plots the original and imputed indicators as a line/area chart:

library(plotly)
library(reticulate)

source_python("python/esg_mrv.py")

year <- 2018
m <- esg_mrv('data/', year=year)

title <- sprintf('ESG Ranked Country Coverage for %d', year)
name1 <- sprintf('Baseline<br>(AUC=%.2f)', sum(m$baseline)/nrow(m))
name2 <- sprintf('Imputed<br>(AUC=%.2f)', sum(m$imputed)/nrow(m))

fig <- plot_ly()
x = c(1:nrow(m))
if( 'imputed' %in% colnames(m) ) {
    m <- m[order(m$imputed, decreasing=TRUE),]
    fig <- fig %>% add_trace(x = ~x, y = m$imputed, type='scatter', mode='lines', fill='tozeroy', name=name2)
}

m <- m[order(m$baseline, decreasing=TRUE),]
fig <- fig %>% add_trace(y = ~m$baseline, type='scatter', mode='lines', name=name1)

fig <- fig %>% layout(title=title, yaxis = list(title='Country Coverage', tickformat=',.0%'), xaxis=list(title='Indicators', showticklabels=TRUE))

image

And this code holds the indicators constant on the X axis and plots the effect as a stacked column chart:

library(plotly)
library(reticulate)

source_python("python/esg_mrv.py")

year <- 2018
m <- esg_mrv('data/', year=year) # use the default parameters
m$indicator = factor(m$indicator, levels = m$indicator)

title <- sprintf('ESG Ranked Country Coverage for %d', year)
name1 <- sprintf('Baseline<br>(AUC=%.2f)', sum(m$baseline)/nrow(m))
name2 <- sprintf('Imputed<br>(AUC=%.2f)', sum(m$imputed)/nrow(m))

fig <- plot_ly(m, x = ~m$indicator, y = ~m$baseline, type='bar', name=name1)
fig <- fig %>% add_trace(x = ~ m$indicator, y = ~m$gain, name=name2)
fig <- fig %>% layout(title=title, barmode='stack', yaxis = list(title='Country Coverage', tickformat=',.0%'), xaxis=list(title='Indicators', showticklabels=FALSE), hovermode='x')

image

I've tried these in the R console and they both work fine, although could probably be spruced up a bit visually. I figured one of you is better positioned to add them to the report (in place of figure 4.4) using styles and colors consistent with the report. To me, each one is interesting, and I'm wondering if we should include both. What do you think?

tgherzog commented 4 years ago

I've done a little more work on this, and the second chart makes a great interactive tool, if you can add a user interface to control 1) the study year, 2) the minimum CV threshold for imputation, and 3) the number of years to impute. Could we do something similar in R?