jwiegley / async-pool

Other
21 stars 13 forks source link

mapTasks threw fgl exception on completion #3

Closed spl closed 9 years ago

spl commented 9 years ago

After switching from mapTasks_ to mapTasks (see #2), I ran a long database migration (3.44 hours). At the end, I found that the code threw an exception: "Match Exception, Node: 74575". This appears to be due to fgl in Data.Graph.Inductive.Graph.context, and fgl is in my cabal sandbox. Since async-pool depends on fgl, logically, the exception is probably coming via async-pool.

The code in mapTasks did complete, but the code after mapTasks did not complete. So, my guess is that it is part of the clean-up of mapTasks.

Looking at cleanupTask, I see two uses of fgl functions (suc and pre), and each of them has a reference to context, so those may be suspect.

The structure of my code is this:

run = do
  ...
  withTaskGroup 50 $ \tg -> void $ mapTasks tg $
    map f xs ++
    map g ys ++
    map (h tg . i) xs
  ...

h tg x = do
  ...
  mapTasks_ tg $ map j $ filter k zs
  ...

And, now that I actually look at my code again, I see mapTasks_ pop up unexpectedly. Looks like I forgot to change this one to mapTasks. I'm going to run the map (h tg . i) xs part again with mapTasks in h to see if I still get the exception.

spl commented 9 years ago

Okay, so I replaced mapTasks_ with mapTasks, and then it runs partially. But I realized that I shouldn't expect nested uses of the same TaskGroup to work because it appears that the outer mapTasks runs some number of tasks and waits for those tasks to finish, but those outer tasks can't run because they have inner mapTasks waiting for free slots in the TaskGroup, which happens to be full of the outer tasks. Consequence: deadlock.

spl commented 9 years ago

Everything works for me now. So, to summarize my issues for posterity:

  1. mapTasks_ is still broken. See #2.
  2. Using a TaskGroup in nested mapTasks will likely lead to deadlock. I'm not sure if this should be obvious or if it's worth mentioning somewhere.

Since there is not really a new issue here, I'll close this.