insongkim / PanelMatch

111 stars 34 forks source link

Error in if (mirn && nrows[i] > 0L) { : missing value where TRUE/FALSE needed} #98

Closed LuMesserschmidt closed 2 years ago

LuMesserschmidt commented 2 years ago

I have been able to run most of my models described in #89 through the 6TB cluster. While running the following, I received an error message that I was not able to solve my own:


PM.match <- PanelMatch(lag = 3, time.id = "year", unit.id = "id",
                                     treatment = "fdi_treat", refinement.method = "ps.weight",
                                     data = dt_matching, match.missing = TRUE,
                                     covs.formula = ~ cname + I(lag(hyde_mean_full, 1:2))+ I(lag(lights_mean, 1:2)), 
                                     size.match = 5, qoi = "att",
                                     outcome.var = "lights_mean", lead = 0:10, forbid.treatment.reversal = FALSE)

PE.results <- PanelEstimate(sets = PM.match, data = dt_matching)

DIfferent matching and weighting methods (Mahalanobis, ps_match) worked without any problems, but ps.weight leads to this error message:

Error in if (mirn && nrows[i] > 0L) { : missing value where TRUE/FALSE needed Calls: PanelEstimate ... panel_estimate -> prepareData -> getWits -> pcs -> data.frame In addition: Warning message: In attributes(.Data) <- c(attributes(.Data), attrib) : NAs introduced by coercion to integer range

adamrauh commented 2 years ago

hi @LuMesserschmidt thanks again for sharing all this, and sorry for the delay in getting to your other ticket. I would be interested to hear more about how you were able to get things to run or any other improvements you've made. I know that there are many things in the code that need to be improved and optimized -- any insight you can share about your findings would be extremely helpful! I'd love to make some progress on that front.

Regarding this particular error, it's a bit odd that ps.matching works but ps.weighting does not. Do you happen to have a smaller reproducible example I could take a look at?

LuMesserschmidt commented 2 years ago

Hi @adamrauh, thanks for getting back to me. I dug up this error above and found that it often occurs when na.rm=T is not included. Any idea where this could be? Unfortunately, I was not able to trace back the error in the source code of your PanelEstimate function. I have now removed all NAs and coerced numeric transformation of all variables prior to the command. My job on the 6TB cluster is now queued but will take some days to be run - I´ll keep you updated!

Unfortunately, I do not really have a reproducible example as these errors occurred only with the big dataset and worked fine on the small. In general, I observed that the RAM required varies extremely during the commands (aka most of the time it runs fine without much required storage) I think one potential avenue is to allow for parallelization. As described in the previous issue I ran the code by individual countries and calculate pooled variances which worked extremely smoothly also on my 8GB RAM, but @insongkim raised the concern that the groups are not fully independent which would bias the estimates (I do agree with his assessment). But maybe one could parallelize the part when a) matching pairs are calculated and b) the att is calculated for each observation pair?

Thanks for your amazing work on this package and providing this public good!

LuMesserschmidt commented 2 years ago

@adamrauh good news: I was able to run the models by just deleting all rows containing NAs (complete.cases(df)). Nonetheless, I think it´s really a matter of adding na.rm=T to the function (if this is not biasing the results?) ;)

adamrauh commented 2 years ago

@LuMesserschmidt Interesting, and thanks for sharing! It's a bit weird that this issue only shows up for some of the matching/weighting methods. Based on what I can remember, I would think that ps.weight and ps.match should generate a similar error for this kind of thing. Where exactly are you adding the na.rm?

Thanks again for the updates on this.

LuMesserschmidt commented 2 years ago

@adamrauh I have not added na.rm, but this was what overleaf suggested as a typical source of errors for this case. As a workaround, I was just removing all rows containing NAs for the treatment variable and covariates and it worked. But I agree that it is odd that the error occurs with ps.weight and not ps.match.