grosser / parallel

Ruby: parallel processing made simple and fast
MIT License
4.16k stars 254 forks source link

Parallel::Deadworker error on a specific set of graph's nodes #154

Open huskebasi opened 8 years ago

huskebasi commented 8 years ago

Hi Everyone! I loved parallel since I discovered it and it allowed me to speedup my simulations in my academic research. I'm now using it to perform some optimization on a graph representing purchase journeys for customers buying a product. I have to perform a scoring process (based on some criteria e.g.entropy, frequency, etc) on all the subset of the nodes of the graph (the enumeration of its powerset) and then choose the best set based on this score. I've implemented with success a call to parallel.map method and 99% of the times the code flows with an awesome speedup but on a specific data set (read from a DB) it gives me this error:

I, [2016-01-22 10:46:06 UTC+0000#4308]  INFO -- : enumerator.rb ---------- : -----===== PARALLEL enumeration started! =====-----
/home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:59:in `rescue in work': Parallel::DeadWorker (Parallel::DeadWorker)| Time: 00:00:57
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:56:in `work'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:319:in `block (4 levels) in work_in_processes'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:419:in `with_instrumentation'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:318:in `block (3 levels) in work_in_processes'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:312:in `loop'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:312:in `block (2 levels) in work_in_processes'
    from /home/ubuntu/.rvm/gems/ruby-2.2.3/gems/parallel-1.6.1/lib/parallel.rb:183:in `block (2 levels) in in_threads'

I've looked a lot online to find solutions or description of this error but I've only found other examples of it in very different contexts. What exactly does it mean? How can I prevent this from happening?

Here's the core of the code that raise the error, if it could help:

  pr_nds = Parallel.map(cs_powerset, :progress => " Executing PARALLEL enumeration") do |c|
    parallel_enumeration(c, ttrg, bpm, dst_mtx, nrm, orig_entropy, entropy_norm_factor, fct_entrpy, fct_imp, fct_frq, fct_nnd, sum_jrnys)
  end
def parallel_enumeration(c, ttrg, bpm, dst_mtx, nrm, orig_entropy, entropy_norm_factor, fct_entrpy, fct_imp, fct_frq, fct_nnd, sum_jrnys)
# c is an element of the powerset of the nodes of the graph
    min_e = Float::INFINITY
    max_e = 0
    res_ary_3d = Array.new
    bs = get_bs(c, ttrg, bpm, dst_mtx)
    if bs and bs.length > 0
      bs.powerset.each do |b|
      # b is an element of the powerset of the milestones (of the purchase journey) of the set c
        trg, rtp, prc = enum_milestones(c, b, dst_mtx)
        ary = Array.new
        trg.each do |t|
          rtp.each do |r|
            prc.each do |p|
              nds = Array.new
            # I compute the score of the element calling other methods
              nds_ary = get_node_arry(nds, bpm)
              entropy = bpm_entropy2(nds_ary, attrs, nrm)
              ent_diff = 1.0 - ((orig_entropy - entropy).abs * entropy_norm_factor)
              min_e = [min_e, ent_diff].min
              max_e = [max_e, ent_diff].max
              ent_diff *= fct_entrpy
              frq = get_node_frequencys(nds_ary, sum_jrnys)
              frq *= fct_frq
              imp = get_importance(nds_ary, @importance)
              imp *= fct_imp
              nns = get_number_of_nodes_score(nds_ary, bpm)
              nns *= fct_nnd
            # The total score of the set is stored in tmp_res
              tmp_res = frq + imp + nns + ent_diff
            # This array contains the score and the nodes associated to it
              ary.push(tmp_res)
              ary.push(nds)
            end # prc
          end # rtp
        end # trg
      # The final result is a 3D array where each element is relative of a subset c
        res_ary_3d.push(ary)
      end # bs
      return res_ary_3d
    end # if
  end

Thanks infintely.

grosser commented 8 years ago

Parallel uses workers to work on each piece of input, for example you have 10 chunks of work and want to run them in 2 processes, then each of these processes gets 5 pieces of work if they are the same size. For unknown reasons one of these workers seems to die (process is killed / kills itself ...) and then when the main process is trying to read what the worker has calculated it raises this exception.

To debug this you could:

grosser commented 8 years ago

sorry for the late reply, but this issue is kind of intimidating and I needed a few calm minutes to wrap my had around what's going on :)

huskebasi commented 8 years ago

The only thing I can say is that your work is amazing! Thank you so much for the effort. I'll use the serial version of the code for the moment.

Thanks again. Regards Riccardo

Il giorno dom 24 gen 2016 alle ore 02:45 Michael Grosser < notifications@github.com> ha scritto:

sorry for the late reply, but this issue is kind of intimidating and I needed a few calm minutes to wrap my had around what's going on :)

— Reply to this email directly or view it on GitHub https://github.com/grosser/parallel/issues/154#issuecomment-174241466.

soobrosa commented 8 years ago

preview Just ordered the T-shirt, ping if you need one.