Closed ta0kira closed 2 years ago
This will also need to lock stderr
while printing each command and its output so that the overall output is actually usable if there's an error.
This can probably be done without threads, using forkProcess
and getProcessStatus
.
But, it's a little unclear how to control when a process gets executed.
Maybe this:
[IO ProcessID]
.(MonadIO m, ErrorContextM m) => [IO ProcessID] -> m [()]
.
map
getProcessStatus
with an error check.n
.
n
to start with.map
non-blocking getProcessStatus
on those n
and partition into done
/notDone
.done ++ recursive ...
, passing notDone
plus length done
more from the original list.It's unclear if this would have the intended effect, however. Probably worth prototyping. (One obvious problem is that done ++
isn't going to work as-is, since recursive
will return m [()]
and not [...]
.)
Separately, it might not be worth trying to keep stderr clean, since errors should be caught before C++ compilation even starts. Also, the final error summary would still be sequential. If there is an error, the user could just rerun without parallelization.
The algorithm above seems fine, based on the prototype below:
import Control.Concurrent
import Control.Monad
import Data.Either
import System.Random
data Process = Process { pValue :: Int, pThresh :: Int }
newProcess :: Int -> IO Process
newProcess x = do
value <- randomIO :: IO Int
let thresh = value `mod` 100000
putStrLn $ "starting " ++ show x ++ " (" ++ show thresh ++ ")"
return $ Process x thresh
checkProcess :: Process -> IO (Either Int Process)
checkProcess p@(Process x t) = do
value <- randomIO :: IO Int
let actual = value `mod` 100000
if actual < t
then return $ Right p
else do
putStrLn $ "finished " ++ show x ++ " (" ++ show actual ++ ")"
return $ Left x
parallelProcess :: Int -> [IO Process] -> IO [Int]
parallelProcess n xs = do
now <- sequence $ take n xs -- This starts execution!
let later = drop n xs
recursive now later where
recursive [] _ = return []
recursive now later = do
tried <- mapM checkProcess now
let (done,wait) = partitionEithers tried
let k = length done
when (k == 0) $ threadDelay 1000 -- Sleep 1ms to control rate.
new <- sequence $ take k later -- This starts execution!
following <- recursive (wait ++ new) (drop k later)
return $ done ++ following
main :: IO ()
main = do
let processes = map newProcess [1..200]
results <- parallelProcess 4 processes
return ()
This definitely wouldn't work for an unbounded number of processes, but the compiler already needs to hold the results of all of these processes in memory at once. (The same limitations would apply if we were using sequence
on the entire list, since that requires evaluation of all expressions in the list before yielding a result.)
We'll also need to limit parallelization to one module at a time, since compile-metadata
needs to be saved before starting on the next module.
One limitation is situations where the compiler creates a .a
from a .o
, since the former depends on the latter. In those cases, the .a
creation would need to happen at the end.
One possibility for handling .o
->.a
is to start+return the ar
process as the result of checkProcess
for the .o
creation. (parallelProcess
doesn't need to know that the new Process
returned by checkProcess
isn't the same one it passed.)
The bottleneck for compilation and integration tests is the C++ compiler. Taking advantage of multiple cores could greatly improve performance.