Parallelize calls to the C++ compiler.

ta0kira commented 3 years ago

The bottleneck for compilation and integration tests is the C++ compiler. Taking advantage of multiple cores could greatly improve performance.

ta0kira commented 3 years ago

This will also need to lock stderr while printing each command and its output so that the overall output is actually usable if there's an error.

ta0kira commented 2 years ago

This can probably be done without threads, using forkProcess and getProcessStatus.

But, it's a little unclear how to control when a process gets executed.

Maybe this:

Collect all commands that need execution, e.g., create an [IO ProcessID].
Have a function (MonadIO m, ErrorContextM m) => [IO ProcessID] -> m [()].
- Simple case would just map getProcessStatus with an error check.
- Parallel case would operate on a sliding window of size n.
  1. Split at n to start with.
  2. map non-blocking getProcessStatus on those n and partition into done/notDone.
  3. Recursive call such as done ++ recursive ..., passing notDone plus length done more from the original list.
  4. Maybe also include a short sleep before the recursive call, e.g., 10ms.

It's unclear if this would have the intended effect, however. Probably worth prototyping. (One obvious problem is that done ++ isn't going to work as-is, since recursive will return m [()] and not [...].)

Separately, it might not be worth trying to keep stderr clean, since errors should be caught before C++ compilation even starts. Also, the final error summary would still be sequential. If there is an error, the user could just rerun without parallelization.

ta0kira commented 2 years ago

The algorithm above seems fine, based on the prototype below:

import Control.Concurrent
import Control.Monad
import Data.Either
import System.Random

data Process = Process { pValue :: Int, pThresh :: Int }

newProcess :: Int -> IO Process
newProcess x = do
  value <- randomIO :: IO Int
  let thresh = value `mod` 100000
  putStrLn $ "starting " ++ show x ++ " (" ++ show thresh ++ ")"
  return $ Process x thresh

checkProcess :: Process -> IO (Either Int Process)
checkProcess p@(Process x t) = do
  value <- randomIO :: IO Int
  let actual = value `mod` 100000
  if actual < t
     then return $ Right p
     else do
       putStrLn $ "finished " ++ show x ++ " (" ++ show actual ++ ")"
       return $ Left  x

parallelProcess :: Int -> [IO Process] -> IO [Int]
parallelProcess n xs = do
  now <- sequence $ take n xs  -- This starts execution!
  let later = drop n xs
  recursive now later where
    recursive [] _ = return []
    recursive now later = do
      tried <- mapM checkProcess now
      let (done,wait) = partitionEithers tried
      let k = length done
      when (k == 0) $ threadDelay 1000  -- Sleep 1ms to control rate.
      new <- sequence $ take k later  -- This starts execution!
      following <- recursive (wait ++ new) (drop k later)
      return $ done ++ following

main :: IO ()
main = do
  let processes = map newProcess [1..200]
  results <- parallelProcess 4 processes
  return ()

This definitely wouldn't work for an unbounded number of processes, but the compiler already needs to hold the results of all of these processes in memory at once. (The same limitations would apply if we were using sequence on the entire list, since that requires evaluation of all expressions in the list before yielding a result.)

We'll also need to limit parallelization to one module at a time, since compile-metadata needs to be saved before starting on the next module.

One limitation is situations where the compiler creates a .a from a .o, since the former depends on the latter. In those cases, the .a creation would need to happen at the end.

ta0kira commented 2 years ago

One possibility for handling .o->.a is to start+return the ar process as the result of checkProcess for the .o creation. (parallelProcess doesn't need to know that the new Process returned by checkProcess isn't the same one it passed.)

ta0kira / zeolite

Parallelize calls to the C++ compiler. #107