openml / openml-r

R package to interface with OpenML
http://openml.github.io/openml-r/
Other
95 stars 37 forks source link

concurrent access to OML Cache #428

Closed smilesun closed 5 years ago

smilesun commented 5 years ago

if two process try to access the same directory at the same time, one of them will fail saying "could not write to directory" this happens with batchtools on lrz. Is there a simple fix for that or requires library enhancement?

ja-thomas commented 5 years ago

you should never have the workers write to the OML directory if you're doing stuff in parallel.

Download and cache all datasets in the master process when setting up batchtools such that workers only need to read from the oml cache directory

giuseppec commented 5 years ago

I agree with @ja-thomas. You usually know which datasets you want to download, so you can use the populateOMLCache function beforehand to download everything you need. Make sure that everything is stored on a shared file system. Also, you should try to avoid that each worker accesses the internet/OpenML whenever you can avoid it.