Closed LuMesserschmidt closed 2 years ago
To give a brief follow up on my case:
I looped the PanelMatch function by country (as described above) and calculated the treatment effect for every country. I then calculated the pooled mean and variance (https://www.ncbi.nlm.nih.gov/books/NBK56512/). This has partially inflated standard errors but effect estimates are nearly the same. Do you have any opinions on whether this looping violates any of your model assumptions?
Thanks!
Dear colleagues,
thank you for providing such an innovative public good to the community. I am researching how FDI projects affect local nighttime development. I have seen your answers on issues #53 and #46. Working with >15 million rows, I am struggling with memory issues and system abort errors (even though I work on a 500GB RAM cloud with 20 nodes) and I hope that you can help me to overcome those:
Let me provide a bit more background of the data: I have divided the world into raster cells (~900k) and for each of these cells, I got 17 years of observation (2002 - 2018): How the light pollution developed ("lights"), whether a cell has been treated in the same year ("treatment"), how much FDI they received ("fdi_volume"). Moreover, I control for the population size ("hyde") in this raster. There are many cells that have never been treated and the distribution of FDI projects is extremely uneven.
Here is a small reproducible example: `library(tidyverse)
year= as.numeric(c(2002:2018)) country= c("AFG","ALB","Country") project_num= c(1:5) treatment=sample(c(0,1), 255,replace=TRUE)
set.seed(1000) lights=runif(255, 1, 63) hyde=runif(255,1000, 200000) fdi_volume =runif(255,1, 200)
dt<- merge(year,country) %>% dplyr::rename(year=x, country=y) dt<- merge(dt,project_num) %>% dplyr::rename(project_num=y) %>% mutate(id=paste(country,project_num,sep="-")) dt<- cbind(dt,treatment) dt<- cbind(dt,lights) dt<- cbind(dt,hyde) dt<- cbind(dt,fdi_volume)`
What solutions do you have discovered to work with large datasets? I found that Mahalanobis treatment matching worked under specific circumstances, while propensity score matching and weighting always failed. I tried to find workarounds by splitting the sample or writing a loop but I haven´t yet come up with a sufficient solution (I read your wiki on Matched Set Objects).
Alternatives In case there is no loop, there might be another workaround: So far, I am including the country as a covariate. One idea could be to divide the dataset by countries and run the panel match individually. But here I am having doubts:
If you allow me, let me post a few more questions here instead of starting new issues: