plfs / plfs-core

LANL no longer develops PLFS. Feel free to fork and develop as you wish.
41 stars 36 forks source link

multithreaded mkdir can fail with multiple backends #354

Open johnbent opened 10 years ago

johnbent commented 10 years ago

If two separate threads issue a call to plfs_mkdir, it is possible that they both fail with EEXIST and that the directory will not be fully created. What can happen is that each of them will loop through the backends and make that directory on each backend. If there is an error seen then they will exit early with the error and not finish looping through the backends. If the two threads are looping in a different order (unusual but not impossible if for example the order of the backends is different in different plfsrcs or if we are using threads to speedup the mkdir like Grider probably wants us to) then each may exit early and not all backends will be created.

One simple fix is to add EEXIST to the FileOp.ignoreErrno() for the mkdir operation but that then will hide the EEXIST error in situations where it is appropriate to return it. So probably what we need to do is slightly modify FileOp so that it can return a list of errno's instead of a single one and it can return whether any of the errno's in ignoreErrno() where seen. Therefore when this happens, the mkdir FileOp will continue looping through all of the backends but when it is done it will return and say, "Hey, I'm all done but by the way, I did encounter and ignore an EEXIST," and then we can return EEXIST to the caller of plfs_mkdir.