Closed HenrikBengtsson closed 3 years ago
I reproduced, I found where the error was triggered I found the variable that was different in the two run. I should be able to figure out what happens. But it is non trivial and very hard to understand. I'm investigating
I fixed it. There was a bug in my code that could only be reached with a combination of stop_early = TRUE
, a failure at the last object to process and an unlucky timing for calling resolved
. I found the issue, I understand the issue but I'm still wondering how we reached it by changing the timeout setting
Thanks for reporting
I fixed it.
Awesome.
I found the issue, I understand the issue but I'm still wondering how we reached it by changing the timeout setting
Good. Yeah, the fact that different timeout lengths changed the behavior is odd, since resolved()
is guaranteed to return either TRUE or FALSE. It could indicate a race condition, but I don't see how that would be the case.
Either way, thanks for fixing so quickly. You're packages tests, together with all the other 180 packages, are invaluable when it comes to bug fixing, improving, and introducing new features in the future framework.
the fact that different timeout lengths changed the behavior is odd, since resolved() is guaranteed to return either TRUE or FALSE.
The idea is that I have a for
loop that looks like that
for (i in x) {
f[i]= future(expr[i])
for (j in 1:i) {
states[j] = test_state(f[j])
update_state_monitoring(states)
}
}
where test_state()
looks if the future is resolved and looks if it triggered a warning or an error or was ok. update_state_monitoring()
displays in real time a map where it shows the spatial regions that are processing, waiting or that were processed ok or with warning or error. In case of error there is a trycatch
block in test_state
to fail gently and to allow to skip one region that won't be in the output.
The problem is that in parallel this loop may be ended but some elements might still be processing in the background (or remotely). Thus there is a second loop like that.
while (any(states == PROCESSING))
{
idx <- which(states == PROCESSING)
for (j in i) {
states[j] = test_state(f[j])
update_state_monitoring(states)
}
sleep()
}
In my case test_state()
in this second loop was slightly different and the test for failing nicely was incomplete and missed to handle a case. The function test
, used in the unit tests, that triggers an error by design at the second iteration to run some testing is so fast that the future
was resolved in the first loop. But with the new timeout it was no longer resolved in the first loop and reached the second were there was a bug.
The error was a combination of the resolved
timing, the use of stop_early = FALSE
to skip errors (I never use this option) and the fact that it fails in the last object because it is a unit test but in true life it is unlikely to fail in the last object if the 100, 500 or 1000 previous ones where ok. We were very lucky to catch it actually.
You're packages tests, together with all the other 180 packages, are invaluable when it comes to bug fixing,
And the ability to run either local or remote parallel tasks in a single line of code without fully understanding how it works is even more valuable :wink:
In the next release of future, we're shortening how long
future::resolve()
will wait for results until giving up (and returningFALSE
). Basically, we'll update the default fromresolved(..., timeout = 0.2)
toresolved(..., timeout = 0.01)
. This will improve the performance since we'll spend less time polling futures.When running reverse package dependency checks, everything seemed to work at first, but when I force all package checks to use 'multisession' by default (instead of 'sequential'), I spotted a problem with
lidR:::cluster_apply()
.Here's a minimal reproducible example adopted from testthat/test-catalog-engine-generic.R:
If I run with a timeout of 0.01 seconds, I get:
If I run with a timeout of 0.2 seconds, it works.
Could you please check and see if you can figure out why this happens?
PS. This is the only package out of ~180 that fails this way.
PS2. Although you should be able to trust all future packages to work the same, there are of course corner cases, etc. If you want to be extra certain that things works, I recommend that all package developers check their packages also with