insongkim / PanelMatch

118 stars 36 forks source link

Error Messages "please convert time id to consecutive integers" and "please convert unit id column to integer or numeric" #130

Closed tsquall121 closed 1 year ago

tsquall121 commented 1 year ago

Hello, I was having so much trouble when trying to use the DisplayTreatment and PanelMatch functions on my own data. The persistent error messages are "please convert time id to consecutive integers" for DisplayTreatment and "please convert time id to consecutive integers" for PanelMatch, when, in fact, my time id column is converted to consecutive integers and unit id column is also integers. Below is the code to generate simulated data.

library(tidyverse)
library(PanelMatch)
set.seed(123)
simulated_data <- tibble(id = rep(1:746, 37) %>% sort(),
       year = rep(1985:2021, 746),
       d = rbinom(27602, 1, 0.5),
       y = runif(27602, min = 0, max = 81345),
       cov = runif(27602, min = 39, max = 2355606)) 

Both id and year in the tibble are integers and consecutive. id goes from 1 to 746 while year ranges from 1985 to 2021.

However, when I ran the following code, I encountered the error messages mentioned above.

DisplayTreatment(unit.id = "id",
                 time.id = "year",
                 treatment = "d",
                 data = simulated_data)
PanelMatch(lag = 5,
           time.id = "year",
           unit.id = "id",
           treatment = "d",
           refinement.method = "CBPS.weight",
           data = simulated_data,
           covs.formula = ~cov,
           qoi = "att",
           outcome.var = "y",
           lead = 0:5)

When looking at the example data in the PanelMatch package, the wbcode2 column (i.e., unit id) for dem data set is not consecutive even though the column type is integer. The following code shows wbcode2 column is NOT consecutive.

dem %>% 
  distinct(wbcode2) %>% 
  pull(wbcode2) %>% 
  diff(lag = 1, differences = 1) 

However, the id column in the simulated_data is both integer and consecutive. The following code shows the proof.

simulated_data %>% 
  distinct(id) %>% 
  pull(id) %>% 
  diff(lag = 1, differences = 1) 

Similarly, when we look at the year column, both the dem and simulated_data are integer and consecutive.

dem %>% 
  select(wbcode2, year) %>% 
  group_by(wbcode2) %>% 
  nest() %>% 
  mutate(year = map(data, pull)) %>% 
  mutate(results_year = map(year, ~rle(diff(.)))) %>% 
  mutate(consecutive_year = map(results_year, ~all(.x$lengths==50 & .x$values ==1))) %>% 
  unnest(consecutive_year) %>% 
  filter(consecutive_year == FALSE)
simulated_data %>% 
  select(id, year) %>% 
  group_by(id) %>% 
  nest() %>% 
  mutate(year = map(data, pull)) %>% 
  mutate(results_year = map(year, ~rle(diff(.)))) %>% 
  mutate(consecutive_year = map(results_year, ~all(.x$lengths==36 & .x$values ==1))) %>% 
  unnest(consecutive_year) %>% 
  filter(consecutive_year == FALSE)

Nonetheless, I still got persistent error messages. I was wondering whether this is a potential bug. Other people reported similar issues on Stackoverflow: https://stackoverflow.com/questions/74815244/panelmatch-r-errors-please-convert-time-id-to-consecutive-integers-and-pleas

Thank you all! @adamrauh @insongkim @erikhw @kosukeimai @ahalterman

tsquall121 commented 1 year ago

This can be solved by transforming the tibble to a data.frame. I found the error messages not very helpful. You @adamrauh may want to remind people about transforming the dataset to a dataframe given that nowadays tibble is the default for many datasets. Or, is it possible to add one line in your source code to coerce the input data as data.frame? The following if statements only work when the data is a data.frame. If it is a tibble, it returns neither integer nor numeric even if the unit.id and time.id are formatted properly.

  data <- as.data.frame(data)
  if(!class(data[, unit.id]) %in% c("integer", "numeric")) stop("please convert unit id column to integer or numeric")
  if(class(data[, time.id]) != "integer") stop("please convert time id to consecutive integers")
adamrauh commented 1 year ago

Hi @tsquall121 , thanks for raising this. I believe this should be fixed the most recent version of the se_comparison branch. It just hasn't made its way onto the CRAN version yet.