ocaml / ocamlbuild

The legacy OCamlbuild build manager
Other
122 stars 81 forks source link

Race condition in Shell.mkdir_p #300

Closed mroch closed 1 month ago

mroch commented 5 years ago

https://github.com/ocaml/ocamlbuild/blob/4ef4b18223383f79819ae123f6dff2a7b969fa7a/src/shell.ml#L62-L64

If another process creates the directory after the sys_file_exists call but before the mkdir, then mkdir raises.

This has been happening in the wild for Flow (https://github.com/facebook/flow/issues/8027). I suspect it's because of make -j and we have multiple ocamlbuild targets so they run in parallel.

A safer approach could be to suppress the EEXIST: https://github.com/janestreet/core/blob/e8cf8a6d340c4af1e8177591782fb3e326eb0a0b/src/core_unix.ml#L1463-L1478

gasche commented 5 years ago

Thanks for the report.

A very safe thing to be doing would be to replace ; mkdir dir by ; try_mkdir dir -- check again for the file existence after the recursive parent creation. This is, of course, not removing the race entirely, but it is an easy change to reason about. The actually-correct approach you suggest sounds good, but harder to implement (we would have to cross the abstraction boundaries of ocamlbuild's system-utils abstractions, and I don't understand the portability consequences of such a change).

It's actually easy to reproduce this failure, using a makefile that looks as follows:

DIR=foo
FILE=code.ml

setup:
    mkdir -p $(DIR)
    echo "let x = 1" > $(DIR)/$(FILE)
    echo "run 'make -j all' to reproduce the failure'"

clean:
    ocamlbuild -clean

dist-clean: clean
    rm $(DIR)/$(FILE)
    rmdir $(DIR)

.PHONY: all test1 test2
all: test1 test2

test1:
    $(MAKE) _build/$(DIR)/$(FILE)

test2:
    $(MAKE) _build/$(DIR)/$(FILE)

_build/$(DIR)/$(FILE):
    ocamlbuild -no-log $(DIR)/$(FILE)

I will first test the safe-and-incomplete change; if it fixes the reproduction case, maybe that's a good first step and I can do a minor release. Then I will look at the harder-and-complete fix.

gasche commented 5 years ago

So:

dbuenzli commented 5 years ago
  • (Windows' mkdir does not support the -p flag), for now I kept the legacy implementation on Windows.

As far as I can tell (but not execute right now), you can simply use mkdir without -p. In Windows it create paths by default, see here.

gasche commented 5 years ago

Yes, but the most important behavior for the present issue is what happens in case the directory or path already exists (not to fail in that case), which I didn't find documented. I'm also not completely sure of how to know which mkdir command will be run by OCamlbuild, as it goes through bash (I hope it's the Windows utility, and only Cygwin, which has a different os_type, uses some coreutils imitation?).

In any case, I changed the implementation in the PR a bit from the previous message: now what I do is that I just ignore any failure of mkdir in the mkdir_p case, which is fine for two processes racing to create the same directory. When that happens, the user still sees an error message in the console/log, which is not nice, but (1) the implementation change is very simple and (2) maybe users shouldn't run ocamlbuild in parallel like that anyway, and it's fine to tolerate some less-nice behavior in this case.

gasche commented 5 years ago

I merged #302 (with fixes for the race-condition-related failures I could reproduce) in master.

@mroch, would it be possible to try Flow builds from the master branch, and see if the failures have gone away?

hhugo commented 1 month ago

302 has been merged, flow doesn't use ocamlbuild anymore, closing