xuyiqing / gsynth

Generalized Synthetic Control Method
Other
132 stars 40 forks source link

Error when controls are included in the formula rather than in X #23

Open onvovi opened 5 years ago

onvovi commented 5 years ago

The dataset used in this example is a panel dataset, very much like the sample "simdata."

Issue: Including a control variable X1 in the formula "Y ~ D + X1" leads to the following error:

Error in if (sum(unlist(tapply(data[, Xname[i]], data[, id], var))) == : missing value where TRUE/FALSE needed

When the same X1 variable is included as "X = X1", the algorithm runs without any issues.

I have looked in the source and found the following lines: if (p > 0) { for (i in 1:p) { if (sum(is.na(data[, Xname[i]])) > 0) { stop(paste("Missing values in variable \"", Xname[i],"\".", sep = "")) } if (sum(unlist(tapply(data[, Xname[i]], data[, id], var)), na.rm = TRUE) == 0) { stop(paste("Variable \"",Xname[i], "\" is unit-invariant. Try to remove it.", sep = "")) } if (sum(unlist(tapply(data[, Xname[i]], data[, time], var)), na.rm = TRUE) == 0) { stop(paste("Variable \"",Xname[i], "\" is time-invariant. Try to remove it.", sep = "")) } } }

It looks like the error is because the if conditional in the middle shown above resulted in an NA. The conditional must have either a TRUE or FALSE result. However, I don't understand why the X1 of the example "simdata" does not lead to an error, while the X1 of the model causes an error (I am new to R).

X1 of "simdata" is a time and unit-varying variable with the following descriptive statistics:

vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 1500 1.05 1.29 0.97 1.02 1.23 -2.78 5.7 8.48 0.2 0.08 0.03

X1 in the model leading to the error is also a time and unit-varying variable with the following descriptives:

vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 14167 103980.4 342887.5 28098 43736.25 30083.44 40 9848011 9847971 14.03 312.2 2880.8

Related question: When I include the X1 of "simdata" in the "formula," I get different results from including it in "X = ". This is likely a conceptual -side- question but why? The variable X1 in "simdata" is both time and unit-varying, so the results should be the same in both specifications, no? Am I missing something here?

Thanks much!

xuyiqing commented 5 years ago

Thank you for letting us know. Would it be possible if you can share with us a sample dataset such that we can replicate the error? Licheng (CCed) will look into it.

Licheng, could you also look into the different results? They should be the same.

Best, Yiqing

On Mon, Dec 3, 2018 at 10:39 AM gtozer notifications@github.com wrote:

The dataset used in this example is a panel dataset, very much like the sample "simdata."

Issue: Including a control variable X1 in the formula "Y ~ D + X1" leads to the following error:

Error in if (sum(unlist(tapply(data[, Xname[i]], data[, id], var))) == : missing value where TRUE/FALSE needed

When the same X1 variable is included as "X = X1", the algorithm runs without any issues.

I have looked in the source and found the following lines: if (p > 0) { for (i in 1:p) { if (sum(is.na(data[, Xname[i]])) > 0) { stop(paste("Missing values in variable \"", Xname[i],"\".", sep = "")) } if (sum(unlist(tapply(data[, Xname[i]], data[, id], var)), na.rm = TRUE) == 0) { stop(paste("Variable \"",Xname[i], "\" is unit-invariant. Try to remove it.", sep = "")) } if (sum(unlist(tapply(data[, Xname[i]], data[, time], var)), na.rm = TRUE) == 0) { stop(paste("Variable \"",Xname[i], "\" is time-invariant. Try to remove it.", sep = "")) } } }

It looks like the error is because the if conditional in the middle shown above resulted in an NA. The conditional must have either a TRUE or FALSE result. However, I don't understand why the X1 of the example "simdata" does not lead to an error, while the X1 of the model causes an error (I am new to R).

X1 of "simdata" is a time and unit-varying variable with the following descriptive statistics:

vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 1500 1.05 1.29 0.97 1.02 1.23 -2.78 5.7 8.48 0.2 0.08 0.03

X1 in the model leading to the error is also a time and unit-varying variable with the following descriptives:

vars n mean sd median trimmed mad min max range X1 1 14167 103980.4 342887.5 28098 43736.25 30083.44 40 9848011 9847971 skew kurtosis se 14.03 312.2 2880.8

Related question: When I include the X1 of "simdata" in the "formula," I get different results from including it in "X = ". This is likely a conceptual -side- question but why? The variable X1 in "simdata" is both time and unit-varying, so the results should be the same, no? Am I missing something here?

Thanks much!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xuyiqing/gsynth/issues/23, or mute the thread https://github.com/notifications/unsubscribe-auth/AHT1GOGUXl6iHBOhuPD7ov9Bzr7IgAqfks5u1W_pgaJpZM4Y_LCx .

-- Yiqing Xu

Assistant Professor Department of Political Science University of California, San Diego http://yiqingxu.org/

liulch commented 5 years ago

The dataset used in this example is a panel dataset, very much like the sample "simdata."

Issue: Including a control variable X1 in the formula "Y ~ D + X1" leads to the following error:

Error in if (sum(unlist(tapply(data[, Xname[i]], data[, id], var))) == : missing value where TRUE/FALSE needed

When the same X1 variable is included as "X = X1", the algorithm runs without any issues.

I have looked in the source and found the following lines: if (p > 0) { for (i in 1:p) { if (sum(is.na(data[, Xname[i]])) > 0) { stop(paste("Missing values in variable \"", Xname[i],"\".", sep = "")) } if (sum(unlist(tapply(data[, Xname[i]], data[, id], var)), na.rm = TRUE) == 0) { stop(paste("Variable \"",Xname[i], "\" is unit-invariant. Try to remove it.", sep = "")) } if (sum(unlist(tapply(data[, Xname[i]], data[, time], var)), na.rm = TRUE) == 0) { stop(paste("Variable \"",Xname[i], "\" is time-invariant. Try to remove it.", sep = "")) } } }

It looks like the error is because the if conditional in the middle shown above resulted in an NA. The conditional must have either a TRUE or FALSE result. However, I don't understand why the X1 of the example "simdata" does not lead to an error, while the X1 of the model causes an error (I am new to R).

X1 of "simdata" is a time and unit-varying variable with the following descriptive statistics:

vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 1500 1.05 1.29 0.97 1.02 1.23 -2.78 5.7 8.48 0.2 0.08 0.03

X1 in the model leading to the error is also a time and unit-varying variable with the following descriptives:

vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 14167 103980.4 342887.5 28098 43736.25 30083.44 40 9848011 9847971 14.03 312.2 2880.8

Related question: When I include the X1 of "simdata" in the "formula," I get different results from including it in "X = ". This is likely a conceptual -side- question but why? The variable X1 in "simdata" is both time and unit-varying, so the results should be the same in both specifications, no? Am I missing something here?

Thanks much!

Can you send me a script that replicates the problem of different results caused by the position of the covariates in ``simdata''? I don't know why I cannot replicate this problem.

liulch commented 5 years ago

I checked the source code of gsynth on CRAN and found that the error does happen sometimes (e.g. when some units have only 1 observation). On Github, we tried to fix this problem and modified part of the source code. So maybe you can try the Github version?

ShunyuanZ commented 5 years ago

I got the same problem when trying to add control variables. @onvovi did you figure out a way to solve the problem? Thank you!

liulch commented 5 years ago

Hello @ShunyuanZ! Have you tried to install the updated version of ``gsynth'' on GitHub?

ShunyuanZ commented 5 years ago

Hi, Thank you for the prompt reply! Actually I just tried, but there was an error in the installation (saying it's not a R package). It will be great if you could let me know whether there was something missing from my implementation. Thank you very much!

I'm using Win 10, R, I did the following:

  1. install.packages("devtools")
  2. library(devtools)
  3. install_github("xuyiqing/gsynth")

Then it appeared to start downloading... However, then I got the following error: "Error: Does not appear to be an R package (no DESCRIPTION)"

There is an additional warning message: In addition: Warning message: In utils::untar(tarfile, ...) : ‘tar.exe -xf "C:\Users\YUAN_W~1\AppData\Local\Temp\RtmpSg51RJ\file372c34822127.tar.gz" -C "C:/Users/YUAN_W~1/AppData/Local/Temp/RtmpSg51RJ/remotes372c59bf4069"’ returned error code 1

Best, Shunyuan

On Fri, Jun 7, 2019 at 10:42 PM liulch notifications@github.com wrote:

Hello @ShunyuanZ https://github.com/ShunyuanZ! Have you tried to install the updated version of ``gsynth'' on GitHub?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xuyiqing/gsynth/issues/23?email_source=notifications&email_token=AD7CFNCMXKFLJW56EKCS243PZMMATA5CNFSM4GH4WCY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXHLPVY#issuecomment-500086743, or mute the thread https://github.com/notifications/unsubscribe-auth/AD7CFND4POSF3J74G2CUCSDPZMMATANCNFSM4GH4WCYQ .

briatte commented 5 years ago

@ShunyuanZ Your issue looks like a download error. Perhaps try downloading again? It could also be a permissions error if you are not an admin on your computer.

I have just tested the repo, and it installs fine.

ShunyuanZ commented 5 years ago

Thank you! I think the problem is not on the package side, it was probably on my side. I tested downloading another GitHub R package, and it popped up the same error: Error: Does not appear to be an R package (no DESCRIPTION)

I will test downloading on my another computer see if it solves the problem. If I'm lucky to solve this issue, I will report back :) Thanks again! Btw, great package. Greatly appreciate your work!

Best, Shunyuan

On Sun, Jun 9, 2019 at 6:25 AM François Briatte notifications@github.com wrote:

@ShunyuanZ https://github.com/ShunyuanZ Your issue looks like a download error. Perhaps try downloading again? It could also be a permissions error if you are not an admin on your computer.

I have just tested the repo, and it installs fine.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xuyiqing/gsynth/issues/23?email_source=notifications&email_token=AD7CFNFJR45MASE4Q735OVDPZTLBFA5CNFSM4GH4WCY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXIHQZY#issuecomment-500201575, or mute the thread https://github.com/notifications/unsubscribe-auth/AD7CFNDIA5TRFJ52SMIS57LPZTLBFANCNFSM4GH4WCYQ .

ShunyuanZ commented 5 years ago

Update: I downloaded the package from GitHub and it solved the problem. Now with updated version of 'gsynth', I was able to include control variables X1 X2 ... in the model. Thanks all! (using R version of 3.6.0 on my another desktop finally got the download work...)

andresgf91 commented 5 years ago

Hi. I am still having this problem with a dataset. Has anyone figured out how to fix it? Thanks a lot. Andres

xuyiqing commented 5 years ago

Could you send a sample dataset (and an R script file) to us? yiqingxu@stanford.edu

On Sun, Jul 28, 2019 at 11:33 AM Andres Ignacio Gonzalez Flores < notifications@github.com> wrote:

Hi. I am still having this problem with a dataset. Has anyone figured out how to fix it? Thanks a lot. Andres

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xuyiqing/gsynth/issues/23?email_source=notifications&email_token=AB2PKGGH6H6DU7NTVT2Z6XLQBXQ7RA5CNFSM4GH4WCY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD27D7UQ#issuecomment-515784658, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2PKGAWGGSZGLO6X6WVHOLQBXQ7RANCNFSM4GH4WCYQ .

-- Yiqing Xu

Assistant Professor Department of Political Science University of California, San Diego http://yiqingxu.org/

aeggers commented 4 years ago

In case it's still relevant, I get this error in gsynth version 1.0.9. (When I update to the github version I get other errors.)

Here is a dataset: https://www.dropbox.com/s/l542utxhlmq1fw4/df.csv?dl=0

If I load this as df and load library(gsynth),

No covariates: this works out1 <- gsynth(income.norm ~ D, data = df, index = c("regno", "year"), se = F)

Covariate included via X argument: also works out2.a <- gsynth(income.norm ~ D, X = expend.norm, data = df, index = c("regno", "year"), se = F) # this works

Covariate included via formula: does not work out2.b <- gsynth(income.norm ~ D + expend.norm, data = df, index = c("regno", "year"), se = F) # this does not work: "missing value where TRUE/FALSE needed"

xuyiqing commented 4 years ago

Hi Could you let us know the errors you're getting with the new version?

On Fri, Mar 13, 2020 at 11:43 PM aeggers notifications@github.com wrote:

In case it's still relevant, I get this error in gsynth version 1.0.9. (When I update to the github version I get other errors.)

Here is a dataset: https://www.dropbox.com/s/l542utxhlmq1fw4/df.csv?dl=0

If I load this as df and load library(gsynth),

No covariates: this works out1 <- gsynth(income.norm ~ D, data = df, index = c("regno", "year"), se = F)

Covariate included via X argument: also works out2.a <- gsynth(income.norm ~ D, X = expend.norm, data = df, index = c("regno", "year"), se = F) # this works

Covariate included via formula: does not work out2.b <- gsynth(income.norm ~ D + expend.norm, data = df, index = c("regno", "year"), se = F) # this does not work: "missing value where TRUE/FALSE needed"

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xuyiqing/gsynth/issues/23#issuecomment-599019428, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2PKGHWNRVO6F36KMGQ6ITRHMRS3ANCNFSM4GH4WCYQ .

-- Yiqing Xu

Assistant Professor Department of Political Science Stanford University http://yiqingxu.org/

aeggers commented 4 years ago

The error (with this dataset) was that there were too few pre-treatment periods. I think there are 7, so this error shouldn't have come up (and didn't with earlier versions I had used).

xuyiqing commented 4 years ago

Could you send us a sample dataset and code file for us to debug? It would also great if you can attach a treatment status plot using PanelView().

Thanks! Yiqing

On Sat, Mar 14, 2020 at 9:38 AM aeggers notifications@github.com wrote:

The error (with this dataset) was that there were too few pre-treatment periods. I think there are 7, so this error shouldn't have come up (and didn't with earlier versions I had used).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xuyiqing/gsynth/issues/23#issuecomment-599093846, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2PKGFHRZQBS44EEZ5OHV3RHOXH7ANCNFSM4GH4WCYQ .

-- Yiqing Xu

Assistant Professor Department of Political Science Stanford University http://yiqingxu.org/

sailmichael commented 4 years ago

I am experiencing the same issue when included a covariate via formula. Is there any fix to this?

xuyiqing commented 4 years ago

Could you send us a sample dataset and code file for us to debug via email? To:

yiqingxu@stanford.edu and liulch@mit.edu

On Tue, Jul 7, 2020 at 4:44 AM sailmichael notifications@github.com wrote:

I am experiencing the same issue when included a covariate via formula. Is there any fix to this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xuyiqing/gsynth/issues/23#issuecomment-654798579, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2PKGCNLM5OWY7NPEOKQ4LR2MDCFANCNFSM4GH4WCYQ .

-- Yiqing Xu

Assistant Professor Department of Political Science Stanford University http://yiqingxu.org/

sailmichael commented 3 years ago

I switched from version 1.0.9 to 1.1.7 to be able to use the cluster variable cl. However, now I am experiencing the same issue again as reported above that I cannot include covariates via formula. Are there any new insights as to what I could do to avoid this issue (sent my data and script in July)?

Thank you for all the effort put into the gsynth package!