Closed smilesun closed 5 years ago
you should never have the workers write to the OML directory if you're doing stuff in parallel.
Download and cache all datasets in the master process when setting up batchtools such that workers only need to read from the oml cache directory
I agree with @ja-thomas. You usually know which datasets you want to download, so you can use the populateOMLCache
function beforehand to download everything you need.
Make sure that everything is stored on a shared file system.
Also, you should try to avoid that each worker accesses the internet/OpenML whenever you can avoid it.
if two process try to access the same directory at the same time, one of them will fail saying "could not write to directory" this happens with batchtools on lrz. Is there a simple fix for that or requires library enhancement?