sinanpl / OaxacaBlinder

R implementation of Oaxaca-Blinder gap decomposition
MIT License
1 stars 1 forks source link

OS problems with testthat::expect_snapshot() #45

Closed sinanpl closed 4 months ago

sinanpl commented 4 months ago

In one of these expectations, I got an error on my laptop - thinking the expectation was not up to date. However, this is an OS problem I think.

Here you can see that all tests pass on mac-os and fail on others.

This test is disabled this for now & needs an update. Solution would probably be rounding the coefs and do an approx comparison with the snapshot.

Relevant commit: ca3ced79

davidskalinder commented 4 months ago

Hmh, that's odd. I don't know how to see which test(s) failed in the GitHub diagnostics (I only see which OSes it failed on?)...

Two other points though:

  1. The tests all pass for me at e8e9de50a4 when I run devtools::test() on my Ubuntu 20.04 laptop.
  2. The change at 31f6a5843f15 fails the "twofold results with bootstraps haven't changed" test for me -- I keep getting the old values.

Do we know whether this is the test that's failing on the other OSes?

As you say @sinanpl, if the test is only off by a tiny amount we could change the tolerance (though it looks like we might need to use testthat::expect_snapshot_value() for that, or just round the numbers ourselves). However if the difference is larger, or if it's the problem I mention above about the difference between e8e9de50a4 and 31f6a5843f15, then I'm not sure how to debug this further?

FWIW, here's the full bootstrap distribution I get when I run that test interactively (and, for reference, the summary, which is what fails the test):

Click to expand long results ``` > twofold$bootstraps$dists$gaps $gap estimate 1 0.1970309 2 0.1885666 3 0.1881771 4 0.1932397 5 0.2206101 6 0.1911830 7 0.1966584 8 0.1512249 9 0.2399630 10 0.2123248 $pct_gap estimate 1 0.07290854 2 0.07089820 3 0.06964412 4 0.07205452 5 0.08146208 6 0.06936620 7 0.07221689 8 0.05679332 9 0.08833806 10 0.07876546 $EY_a estimate 1 2.702439 2 2.659682 3 2.701981 4 2.681854 5 2.708132 6 2.756141 7 2.723163 8 2.662723 9 2.716417 10 2.695659 $EY_b estimate 1 2.505408 2 2.471115 3 2.513804 4 2.488615 5 2.487522 6 2.564958 7 2.526505 8 2.511498 9 2.476454 10 2.483334 > summary(twofold) Oaxaca Blinder Decomposition model ---------------------------------- Type: twofold Formula: ln_real_wage ~ age + education | female Data: chicago_long Descriptives n %n mean(ln_real_wage) female==0 412 57.9% 2.7 female==1 300 42.1% 2.5 Gap: 0.21 % Diff: 7.62% coefficient % of gap se 2.5% 97.5% explained -0.03 -14.1% 2.038143e-02 -7.192541e-02 -1.736619e-02 unexplained 0.23 114.1% 2.090135e-02 2.049169e-01 2.636727e-01 unexplained_a 0.00 -0.0% 8.519047e-16 -1.898993e-15 7.903563e-16 unexplained_b 0.23 114.1% 2.090135e-02 2.049169e-01 2.636727e-01 ```

The analysis I'm working on now is only using threefold results, so this possible error in the twofold results aren't a super high priority for me at the moment... But of course @sinanpl let me know if I can help check anything for you on this?

davidskalinder commented 4 months ago

Whoops, wrong bootstraps. Here are (I think) the ones that are going wrong in that test:

Click to expand ``` > twofold$bootstraps$dists$overall $explained estimate 1 -0.01685624 2 -0.02217687 3 -0.07595069 4 -0.05076081 5 -0.01912269 6 -0.05806056 7 -0.04147327 8 -0.05200049 9 -0.02214232 10 -0.02316519 $unexplained estimate 1 0.2138871 2 0.2107435 3 0.2641278 4 0.2440005 5 0.2397328 6 0.2492436 7 0.2381317 8 0.2032254 9 0.2621053 10 0.2354900 $unexplained_a estimate 1 -9.704694e-16 2 -1.996233e-15 3 -7.096103e-16 4 -7.945034e-16 5 1.100899e-15 6 -2.792905e-16 7 -1.351946e-15 8 -4.536302e-16 9 -1.564057e-15 10 -3.400058e-16 $unexplained_b estimate 1 0.2138871 2 0.2107435 3 0.2641278 4 0.2440005 5 0.2397328 6 0.2492436 7 0.2381317 8 0.2032254 9 0.2621053 10 0.2354900 ```
davidskalinder commented 4 months ago

I don't know how to see which test(s) failed in the GitHub diagnostics (I only see which OSes it failed on?)...

Ah, never mind, apparently I was blocking the JS that did this. So I can see the results now. I may have to revisit the rest of my comment in light of that then lol.

davidskalinder commented 4 months ago

Okay, that's really strange -- it looks like that one line of unexplained_a bootstraps changes considerably across all the OSes (including, it seems, mine and yours @sinanpl)... That makes me think that something's going wrong with that calculation beyond just the floating point tolerance. I'll have a look.

davidskalinder commented 4 months ago

Okay, looking at it more closely I'm coming back @sinanpl to your point of view that this in fact a rounding/tolerance issue: I forgot that the estimates that all look very different are in fact extremely close to zero and so only differ after the first nine decimal places or so. (In fact, I get different results when I run the test interactively compared to when I run it with devtools::test().

So I'm glad that it doesn't look like an error in the calculation code. But like I say above, unfortunately it doesn't look like testthat::snapshot() has a tolerance option, so we'll have to get a little cleverer to get this test to run again.

sinanpl commented 4 months ago

48 solves this with rounding