Closed JZL closed 2 years ago
Author of future here:
Something in your code ends up producing lots of R conditions (e.g. messages, warnings), or very large ones. All conditions are by default captured by futures (=on the parallel workers) and relayed as-is in the main R session.
I would try to identify what produces all those conditions. If they cannot be avoided (e.g. disable message output in a function via some argument, suppressMessages()
, ...), then as a last resort, you can tell futures to not capture all types of conditions. For info on that, see argument conditions
to future()
, cf. https://future.futureverse.org/reference/future.html.
PS. You closed the issue again without comments. I think it would be helpful to future visitors to know how you solved your problem.
Hi,
Oh thanks for responding sorry I didn't see it until now.
Those are all really good points and I'll look into the conditions
argument. My hacky solution (because it was just for personal use) is here. I just forcibly replaced the condition variable with the paste0
string version of it, as a compromise to help me debug with error messages, but without copying all the additional state.
Yeah, closing it without comment is a bad habit. I know some package authors like to not have open issues so for very open ended issues I just immediately close them, but I can leave it up to the package author
Hi,
EDIT: Looking close, this does seem pretty squarely a
future
+batchtools.future
problem, it's just that my quick-to-implement solution was by slightly modifying thebatchtools
source code. I'll make a better proof of concept and bring it up over there.I'm not quite sure what to make of this since I use a weird combination of batchtools, batchtools.future, and a custom job runner.
But I'm running batches with 1k+ jobs and moderately sized global variables (< 200 MB) per job. I was trying to reduce the size of the
results/1.rds
files and noticed that theobject.size
value for the1.rds
can be much smaller than the actual on-disk size.After some digging it turns out to be a known problem (e.g. here) and I narrowed it down to the
$conditions
part ofFutureResult
object. Where if I clear that list's items, the saved RDS size goes from 80MB -> 780KB.There are definitely internal environments within those conditions, and I tried to just remove a few of them, but could never get the space to be restored. I think it could be
future
's global variable stored, but I can't quite find where. I'm planning on bodging a fix where I replace the $environment with thepaste0
version of each elements value (so I can see any errors but make sure it's just character vectors)But I didn't know if this was an interesting problem enough to the general package it was worth investigating further. Or if you had any advice on cleaner ways of dealing with it
Thanks! Batchtools is a huge help for parallelizing my code