ngreifer / cobalt

Covariate Balance Tables and Plots - An R package for assessing covariate balance
https://ngreifer.github.io/cobalt/
73 stars 11 forks source link

problem with bal.tab(): "All weights are zero when treat = TRUE" #15

Closed martinzuba closed 6 years ago

martinzuba commented 6 years ago

Hi. I try to get a bal.tab with preprocessed output from weightit.

I receive the error message: "All weights are zero when treat = TRUE".

However, this is not the case, as all weights are above 1 and none are NA or NULL or whatever.

I have traced the problem to some odd behaviour of the apply function in combination with the check_if_zero function: The check whether all is zero yields "FALSE" if called outside the apply function and "TRUE" (incorrectly) if called via the apply function.

I use latest versions of weightit, cobalt (installed today) and R.

This is from the debugger which puts me in check_if_zero_weights().

Thanks for your help!

Martin

Browse[1]> version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          5.1                         
year           2018                        
month          07                          
day            02                          
svn rev        74947                       
language       R                           
version.string R version 3.5.1 (2018-07-02)
nickname       Feather Spray      

Browse[1]> error
[1] "All weights are zero when treat = TRUE."
Browse[1]> problems
[1]  TRUE FALSE
Browse[1]> w.t.mat
     Var1  Var2
1 weights  TRUE
2 weights FALSE
Browse[1]> **problems <- apply(w.t.mat, 1, function(x) all(check_if_zero(weights.df[treat == 
+     x[2], x[1]])))**
Browse[1]> problems
[1]  **TRUE** FALSE
Browse[1]> check_if_zero(weights.df[treat == TRUE, "weights"])
  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[129] FALSE FALSE FALSE FALSE FALSE
Browse[1]> all(check_if_zero(weights.df[treat == TRUE, "weights"]))
[1] FALSE
Browse[1]> all(check_if_zero(weights.df[w.t.mat[1,2], w.t.mat[1,1]]))
[1] **FALSE**
Browse[1]> weights.df[treat == TRUE,]
  [1]  67.307627  79.826155   1.885578 158.026877  45.196684  56.164825  48.752937  30.879905  11.674147
 [10]  26.692058  13.295941  82.674590 196.897668 149.248587  52.762289  39.684289  33.170495  39.308733
 [19]  53.171570  41.240477 160.372561 194.005269  60.413108  43.963563  15.230456  42.829538  27.463982
 [28]  19.865331  50.847341 186.199997   9.753240  68.585247  36.408196  38.549354  38.755218  40.884345
 [37]  29.620660  52.601934 111.332990  56.297451  27.031934 101.012349  34.349574 107.525278  78.727894
 [46]  11.777890  70.914950  55.277485  51.883375  71.255899  32.319254  42.992511  72.273144  28.642228
 [55] 137.954277  34.807268  60.276977  63.115426  68.655834 133.662160  17.536231  94.708934  65.562180
 [64]  60.241740 109.878914  79.942162  28.324305  74.347703  66.622288  26.406760  20.897160  69.370021
 [73]  52.737898  46.644920  96.783245  47.111526  35.341429  77.041636  77.046557  30.057204  91.398045
 [82]  46.837280  94.873180  37.793427 104.106985  21.611831  18.633768 140.601745  21.072106  84.664917
 [91] 171.780325  23.068098  65.262950  45.945273  65.830478  13.585935  14.353937  36.560600  77.410477
[100]  42.240395  11.444596  67.281186  25.100079 117.032776  66.714564 190.680325  27.129495  69.194680
[109]  74.293695  28.874397  32.587939  95.918416  27.744732  94.771610  11.792023  83.279133  31.746677
[118]  36.733866  13.132560  66.008024  40.119701  78.070225  16.603842  35.215006  57.132454  44.612056
[127]  20.949391  81.514315  47.458328  15.125913  70.443032  33.938332  54.767617
ngreifer commented 6 years ago

Thank you so much for letting me know about this. I'll have it fixed tonight and let you know. By "latest version", do you mean the version on CRAN or the development version on GitHub?

ngreifer commented 6 years ago

It looks like I already addressed this bug on the development version of cobalt. Rather than check whether all of the weights are 0, I instead check whether the first weight is 0 and all the weights are the same. I was unable to reproduce your bug using the old code, though. Also I'm surprised WeightIt was giving you weights that large. Try using the development version of cobalt and let me know if the problem persists.

martinzuba commented 6 years ago

Thank you for your reply, Noah! I used the version on CRAN. I will test the development version once I'm back at work :-)

I am also somewhat puzzled by these large weights, my guess is they may be so large because treatment group is only 1/10th of the total sample, and the weighting method makes sure that treatmeant and control have same sum of weights.

ngreifer commented 6 years ago

Which method in WeightIt were you using? To my knowledge none of them require the treated weights to sum to the control weights. It doesn't actually matter because weights can be multiplied by a scalar and have the same properties, but I want to make sure that's not a bug too!

martinzuba commented 6 years ago

Hi Noah! Back to work, at last.

Find attached some code to reproduce the problem with check_if_zero_weights as well as a suggestion for a function that works for me.

debug.zip

ngreifer commented 6 years ago

Thank you so much for your help. I discovered what the problem was.

I was indeed a problem with apply that only occurred if the names of the treatment levels were of different length. But it came down to me failing to appreciate what apply does. I fed it a data.frame (w.t.mat) which has a character column (for the names of the weights) and a column of the same type as the unique values of the treatment (in your case, logical). apply runs down the data.frame by row and creates a vector from each row, then operates on that vector. The problem is that the vector must be one single type, and since one of the values (the name of the weights) was character, it would coerce the other value (the value of the treatment level) to also be character. When R does this, it added whitespace before TRUE to make it " TRUE". Since no units had " TRUE" as a treatment level, it failed to return a sensical value.

What I did instead was to use vapply along the row indices and extract the name of the weights and the unique values of the treatment using the row and column indices. Now there is no coercing of types and the whole thing should work as expected. Thank you so much for telling me about this and doing the work to help me fix it.

I have updated the GitHub version of cobalt so you can go ahead an install it and give it a try. If it works correctly, I'll close this issue.