Closed mroch closed 1 month ago
Thanks for the report.
A very safe thing to be doing would be to replace ; mkdir dir
by ; try_mkdir dir
-- check again for the file existence after the recursive parent creation. This is, of course, not removing the race entirely, but it is an easy change to reason about. The actually-correct approach you suggest sounds good, but harder to implement (we would have to cross the abstraction boundaries of ocamlbuild's system-utils abstractions, and I don't understand the portability consequences of such a change).
It's actually easy to reproduce this failure, using a makefile that looks as follows:
DIR=foo
FILE=code.ml
setup:
mkdir -p $(DIR)
echo "let x = 1" > $(DIR)/$(FILE)
echo "run 'make -j all' to reproduce the failure'"
clean:
ocamlbuild -clean
dist-clean: clean
rm $(DIR)/$(FILE)
rmdir $(DIR)
.PHONY: all test1 test2
all: test1 test2
test1:
$(MAKE) _build/$(DIR)/$(FILE)
test2:
$(MAKE) _build/$(DIR)/$(FILE)
_build/$(DIR)/$(FILE):
ocamlbuild -no-log $(DIR)/$(FILE)
I will first test the safe-and-incomplete change; if it fixes the reproduction case, maybe that's a good first step and I can do a minor release. Then I will look at the harder-and-complete fix.
So:
mkdir -p
on Unix systems, which does fix the race by delegating concurrency control to the filesystem (I like that approach); this requires Windows-specific logic (Windows' mkdir
does not support the -p
flag), for now I kept the legacy implementation on Windows.
- (Windows'
mkdir
does not support the-p
flag), for now I kept the legacy implementation on Windows.
As far as I can tell (but not execute right now), you can simply use mkdir
without -p
. In Windows it create paths by default, see here.
Yes, but the most important behavior for the present issue is what happens in case the directory or path already exists (not to fail in that case), which I didn't find documented. I'm also not completely sure of how to know which mkdir
command will be run by OCamlbuild, as it goes through bash
(I hope it's the Windows utility, and only Cygwin
, which has a different os_type, uses some coreutils imitation?).
In any case, I changed the implementation in the PR a bit from the previous message: now what I do is that I just ignore any failure of mkdir
in the mkdir_p
case, which is fine for two processes racing to create the same directory. When that happens, the user still sees an error message in the console/log, which is not nice, but (1) the implementation change is very simple and (2) maybe users shouldn't run ocamlbuild
in parallel like that anyway, and it's fine to tolerate some less-nice behavior in this case.
I merged #302 (with fixes for the race-condition-related failures I could reproduce) in master.
@mroch, would it be possible to try Flow builds from the master branch, and see if the failures have gone away?
https://github.com/ocaml/ocamlbuild/blob/4ef4b18223383f79819ae123f6dff2a7b969fa7a/src/shell.ml#L62-L64
If another process creates the directory after the
sys_file_exists
call but before themkdir
, thenmkdir
raises.This has been happening in the wild for Flow (https://github.com/facebook/flow/issues/8027). I suspect it's because of
make -j
and we have multipleocamlbuild
targets so they run in parallel.A safer approach could be to suppress the
EEXIST
: https://github.com/janestreet/core/blob/e8cf8a6d340c4af1e8177591782fb3e326eb0a0b/src/core_unix.ml#L1463-L1478