grosser / parallel

Ruby: parallel processing made simple and fast
MIT License
4.16k stars 254 forks source link

Parallel.map sometimes hangs for two minutes when collecting processes #350

Closed Minauras closed 1 month ago

Minauras commented 2 months ago

I have a script that periodically runs some heavy computation in 5 processes, and I'm timing the time it takes for each run to complete.

The computation involves querying from a Postgres DB and a Little Table DB and some processing of that data, if that's important.

Sometimes, the run takes ~15s to complete, but sometimes it takes ~2min15s to complete.

When looking at the logs, I find that the processes don't do more computation during those runs, but after they're done computing and have exited their function, Parallel.map hangs for 2min seemingly doing nothing before the letting the rest of the script run.

What could be happening here? Any idea as to what I should investigate? Unfortunately I cannot share a reproducible example. Thanks for any help!

grosser commented 2 months ago

I'd start by checking if it's actually parallel hanging or postgres, so

Parallel.map do
  puts 'a'
  x = stuff
  puts 'b'
  x
end

or something like that

then you can tell if it's the db work hanging or parallel itself

Minauras commented 2 months ago

Thanks for the answer, I did something like

Parallel.map(x, in_processes: 5) do |y|
  puts "a"
  ret = my_function(y)
  puts "b"
  ret
end
puts "c"

And the two minutes slowdown happens between b and c, so I assume it's parallel hanging? Is there a way to see what parallel might be doing? Parallel doesn't have a 2min timeout or something, right?

grosser commented 2 months ago

the only thing it does there is send the data through a pipe ... which might hang for some weird unix reason 😞

... I hope that will show it's the "sending through the pipe" part next can do

puts "b"
puts Marshal.dump(ret).size
puts "c"

to see if the issue is serialization and if the data maybe is very big

Minauras commented 2 months ago

Thanks a lot!

So it seems it wouldn't be an issue with the pipe or with the serialization?

grosser commented 2 months ago

yeah the .each still sends things over the pipe, so that could still be the issue

next I'd do is bundle open parallel and start dropping some puts around the Worker.work (line 73) and wait line 91 and see if they show reading from the pipe being stuck maybe also line 601 inside the ensure to see if closing the pipe hangs

grosser commented 2 months ago

ideally find a way to make the hanging part shareable, but that might be hard 😞

Minauras commented 1 month ago

Hi, updating this, thanks for the help!

I wasn't able to investigate in the gem itself, as it's not possible in my environment. However, I found that the issue does not come from the parallel gem.

I replaced

Parallel.each(x, in_processes: 5) do |y|
  puts "a"
  ret = my_function(y)
  puts "b"
  ret
end
puts "c"

by

x.each do |y|
  Process.fork do
    puts "a"
    ret = my_function(y)
    puts "b"
  end
end
Process.waitall
puts "c"

And the issue was the same, 2 minutes of hanging between "b" and "c", so the issue is reproducible without parallel.

Then, since seemingly nothing was happening while the program was hanging, I tried exiting the subprocesses early, after "b":

x.each do |y|
  Process.fork do
    puts "a"
    ret = my_function(y)
    puts "b"
    abort
  end
end
Process.waitall
puts "c"

This does nothing, however, when using exit! instead of abort:

x.each do |y|
  Process.fork do
    puts "a"
    ret = my_function(y)
    puts "b"
    exit!
  end
end
Process.waitall
puts "c"

then the hanging disappears.

Apparently the difference between exit! and abort is that exit! skips at_exit callbacks, so it might be something to do with at_exit, however I tried what was described in this thread and found that no at_exit callback was registered by any gem.

This problem seems to go beyond what I'm capable of debugging, so I'm happy to settle with exit! as a workaround, though I still have no idea what the issue is.

Thanks a lot for the help!