ssadler / hawk

Awk for Hoodlums
BSD 3-Clause "New" or "Revised" License
35 stars 2 forks source link

Multiple hawk calls go in race condition for configuration and compilation #45

Open melrief opened 11 years ago

melrief commented 11 years ago

I'm experiencing some problems when many hawk instances are created. You can recreate it by pipelining many hawk calls:

> hawk -e '[1..100]' | hawk -m 'P.replicate 10'
hawk: /tmp/3574191538758045375: removeLink: does not exist (No such file or directory)

note that not always the error is raised, but I saw that if I lock the part in which we interpret the expression, the error disappear. Probably two instances of hint will work on the same file (that uses a timestamp as filename I think) if the computer is very fast. If this is a hint bug, we should consider opening a bug report on hint.

I think there is a similar error on the configuration. Only one process should compile Prelude.hs, the others should wait for the result.

The locking part is a tricky one. POSIX should have lockable files, but this is not a portable solution. We need a library that abstract locks on files.

This is related to #43: with a centralized demon we could implement a queue system to avoid race conditions. Yet I think we should work with locks and avoid the daemon for now.

gelisam commented 11 years ago

Hmm, that makes me realize a potential problem with the daemon strategy. What if we have:

> hawk -e [1..] | hawk -m 'id'

Then the deamon cannot serve one request until completion before serving after another, it must serve both requests in parallel. Which kind of defeats the idea of having a daemon which is always ready to execute a piece of code in the pre-loaded environment.

melrief commented 11 years ago

Currently there are three phases of a hawk run:

Of these three steps only the first one should suffer race conditions problem. The others two should be, in my opinion, perfectly parallelizable. And this should be independent from having a daemon or not. I can't understand why hint can't interpret in parallel two expressions. We should investigate more on this issue, to me it seems a hint problem.

For now a solution could be to lock the first step such that only one process compile the configuration and compile the runInterpreter call (the second step). The third step works in parallel.

melrief commented 11 years ago

I added this to the first milestone, we can't afford to release a program that goes in race condition when many instances are running together

melrief commented 11 years ago

I was able to isolate the problem, it is hint as I suspected. I opened a bug report. For now I will try to lock the compilation part such that no race condition can happen, but in future I hope we will be able to run in parallel the interpreter.

melrief commented 11 years ago

Added a new branch lock, which locks both the configuration compilation and the expression interpretation. I haven't merged it yet in the merge branch because I don't know how to test it well. @gelisam I don't have windows nor cygwin, could you please try to use the executable from that branch under cygwin when you have time and report if many instances of hawk pipelined can still raise problems? For example:

> hawk -e '[1 .. 10000]' | hawk -m 'head' | hawk -d 'P.filter (P.not . B.null)' | hawk -m '"The first letter is: " `append`'

with prelude.hs:

import qualified Data.ByteString.Lazy as B
gelisam commented 11 years ago

I could reproduce the problem on cygwin earlier, but not since I have re-compiled? Which is strange, given that it is not your lock branch which I re-compiled!

I will try harder to reproduce the issue.

gelisam commented 11 years ago

Ah! Of course, the issue only occurs if prelude.hs has just been changed. Which is why I could only reproduce it on my very first run. The re-compilation had nothing to do with it!

gelisam commented 11 years ago

Damn, I'm having trouble installing the new unix dependency because of this problem.

gelisam commented 11 years ago

Okay, I tracked down the error, but I don't know how to fix it yet.

cabal fetches unix-2.6.0.1.tar.gz, unpacks it and runs its ./configure script. This configure scripts runs cygwin's gcc, finds -lrt and -ldl, and tells cabal that everything is fine, it just needs to link with -lrt and -ldl. Cabal then hands over the information to ghc's mingw-based gcc, which cannot find either.

I'm not sure what those rt and dl dependencies are, but apparently I need to install them as mingw packages, even though I didn't explicitly install mingw myself.

gelisam commented 11 years ago

Maybe this solution might be more cross-platform? I'll need to test on my work machine, I don't use Windows at home either.

gelisam commented 11 years ago

Nah, that code has an obvious race condition.

gelisam commented 11 years ago

A bit lower on this page, however, are examples in other languages in which a socket number is used as a flag. Would this work?

gelisam commented 11 years ago

I have pushed a different solution to the lock branch, please check it out. So far I have only checked it on cygwin.

Since I use a socket as a lock indicator, I am concerned about the impact on performance: is the socket number released immediately after it is closed, or is there a delay?