ruby-concurrency / concurrent-ruby

Modern concurrency tools including agents, futures, promises, thread pools, supervisors, and more. Inspired by Erlang, Clojure, Scala, Go, Java, JavaScript, and classic concurrency patterns.
https://ruby-concurrency.github.io/concurrent-ruby/
Other
5.71k stars 420 forks source link

Any plans to support Ruby 3 ractor as a back-end for most common constructs in the gem? #899

Closed marianposaceanu closed 1 year ago

marianposaceanu commented 3 years ago

I'm really curios if this is actually feasible and what are the plans if any regarding this direction.

Thank you for all the great work on this gem (and yes, I'll try to answer the question myself by forking and playing a bit with the repo).

jdantonio commented 3 years ago

In principle I believe that many of these abstractions still have value in the Ruby 3 world. For the same reasons we have multiple data structures like map, array, stack, queue, etc. Abstractions solve problems. Ractor should be a much better foundation than threads, as I believe you are suggesting with your question. As far as concrete plans, I haven't worked in Ruby or on this gem in several years so this isn't something I plan to work on. Im 100% in favor of others giving that a try and I'd be happy to support those efforts in any way that I can. @pitr-ch is the maintainer now so any merges to this particular repo will need his input. I haven't spoken to him in a while so I have no idea what his plans are.

pitr-ch commented 3 years ago

Hi, I'll be grateful for any information you share back from your experiments. I am planning to look at it in more detail in the summer.

stouset commented 3 years ago

This would be really useful, particularly with thread pools.

Unfortunately, the current design of thread pools allows you to pass a new proc of work to be done on each call to post, whereas a Ractor-style approach requires the proc to be set during initialization, with post only passing arguments that will be sent to the Ractor.

stouset commented 3 years ago

Upon further investigation, it seems that wrapping Ractor in any meaningful way is impossible with the current version of Ruby.

[1] pry(main)> def ract(&task)
[1] pry(main)*   Ractor.new(task) { |t| t.call }
[1] pry(main)* end  
=> :ract
[2] pry(main)> ract { }
TypeError: allocator undefined for Proc
from <internal:ractor>:267:in `new'

Essentially Ractor.new { expr } is syntax that causes the block to be isolated. If you try to pass a proc from a wrapper method, it won't be isolated. So there's no way to create a Ractor that does thread-pool househkeeping while invoking a user-provided callback internally.

pitr-ch commented 3 years ago

Thanks a lot for the investigation. This is a pity, it was originally planned to have additional syntax to create isolated blocks as needed, exactly for these use cases. It appears it did not get in. If you raise this issue with Ruby please send me a link to the issue.

jaesharp commented 3 years ago

Essentially Ractor.new { expr } is syntax that causes the block to be isolated. If you try to pass a proc from a wrapper method, it won't be isolated. So there's no way to create a Ractor that does thread-pool househkeeping while invoking a user-provided callback internally.

Calling Ractor#make_shareable with task (from your example) permits isolating the provided block (at least on ruby-head and ruby-3.0.1), and thus - wrapping Ractor creation/management. Without explicitly asking Ractor#make_shareable to copy the object before freezing (using copy:), this does have side-effects (related to freezing the argument, recursively - see Ractor docs...), so those do need to be looked into, but it is possible.

Example:

[1] pry(main)> def ract(&task)
[1] pry(main)*   Ractor.make_shareable(task)
[1] pry(main)*   Ractor.new(task) { |t| t.call }
[1] pry(main)* end  
=> :ract
[2] pry(main)> ract { }
<internal:ractor>:267: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
=> #<Ractor:#2 (pry):3 blocking>
[3] pry(main)> r = ract { 3 }
=> #<Ractor:#3 (pry):3 terminated>
[4] pry(main)> r.take
=> 3
[5] pry(main)> r.take
Ractor::ClosedError: The outgoing-port is already closed
from <internal:ractor>:694:in `take'
[6] pry(main)> RUBY_REVISION
=> "32b18fe9d04e9c95ac0b8d5df258226867efc063"

This also works without modification on Ruby 3.0.1 release, so it is not reliant on any ruby-head specific feature:

[1] pry(main)> def ract(&task)
[1] pry(main)*   Ractor.make_shareable(task)
[1] pry(main)*   Ractor.new(task) { |t| t.call }
[1] pry(main)* end  
=> :ract
[2] pry(main)> ract { }
<internal:ractor>:267: warning: Ractor is experimental, and the behavior may change in future versions of Ruby! Also there are many implementation issues.
=> #<Ractor:#2 (pry):3 terminated>
[3] pry(main)> r = ract { 3 }
=> #<Ractor:#3 (pry):3 terminated>
[4] pry(main)> r.take
=> 3
[5] pry(main)> r.take
Ractor::ClosedError: The outgoing-port is already closed
from <internal:ractor>:694:in `take'
[6] pry(main)> RUBY_REVISION
=> "0fb782ee38ea37fd5fe8b1f775f8ad866a82a3f0"

You might also be able to use Ractor.send(..., move: true) to send the provided block at an arbitrary time after creating a Ractor with Ractor.new { ... } w/o an argument instead of making it shareable immediately and passing it as a parameter to Ractor#new. Doing so has somewhat different semantics (the local reference is replaced with a reference to Ractor::MovedObject and can't be accessed until it's sent back from the Ractor to your Ractor via a channel), so it might not be appropriate for the use case - however, it's there. That said, stouset did mention wanting to pass the block to a ractor created at an earlier time - so, perhaps it actually is what you're looking for?

pitr-ch commented 3 years ago

@justinlynn Thanks very much for the additional information! That looks promising and it was part of the original design, the previous discussion made me worried that it was dropped. I'll be looking into this more over the summer or a prototype PR would be always welcomed!

stouset commented 3 years ago

Awesome investigation! I don't believe I tried to make_shareable the task when I looked into this, so perhaps that was the missing piece.

jaesharp commented 3 years ago

Thanks very much for the kind words, @pitr-ch and @stouset. I'll see what I can come up with but I've been quite busy with work recently so I can't promise. Cheers :)

Are there any implementations/constructs in particular we should focus on porting? Should we create a new backend in the same way that ruby and java backends are separated?

OmriSama commented 3 years ago

Should this be marked as looking-for-contributor?

eregon commented 2 years ago

I think few abstractions of concurrent-ruby are actually compatible with Ractors (the programming model, which never allows one mutable object to be used by multiple Ractors), except thread pools and actors-like abstractions (if they have copy semantics) could probably run on top of Ractor with some limitations. Using move semantics would be very confusing for blocks/Procs, as then anything they access (e.g. captured local variables, etc) would be moved too and so it seems likely to cause the caller to run into issues soon after.

Using abstractions using Ractors would always needs to be opt-in (cannot just replace the backend of existing abstractions) because it seems impossible to have fully-compatible behavior built on Ractor given the restrictions Ractors enforce (e.g., deep copy of the Proc and its captured locals, no way to e.g. copy an IO instance in, can't even access STDOUT/STDERR although $stdout/$stderr are fine, etc). Also I'd think currently many gems are not compatible with Ractor yet, so the work that can run in Ractors seems currently fairly limited, which is another sort of incompatibility.

Back to my first point, I think Concurrent::{Map,Array,Set,AtomicReference} all cannot reasonably work on Ractor, implementing them on Ractor would mean an extra Ractor per instance (heavy on footprint it's an extra OS thread, not so fast for communication due to copying) and would serialize all accesses to that data structure since everything would happen a single Ractor, and so they would no longer work concurrently/in parallel (on any Ruby implementation). Also the copying would mean having mutable objects in the data structure wouldn't work as before, as external updates wouldn't be reflected.

eregon commented 1 year ago

Another challenge is a block/Proc captures self and so:

r = Ractor.new { p Ractor.receive.call }
r << Ractor.make_shareable(-> { 2*3 })

gives:

<internal:ractor>:816:in `make_shareable': Proc's self is not shareable: #<Proc:0x00007fccf0483900 (irb):20 (lambda)> (Ractor::IsolationError)
r << Ractor.make_shareable(nil.instance_exec { -> { 2*3 } })

works but that nil.instance_exec needs to literally surround the Proc, it cannot be done after a regular Proc with a non-nil receiver is created. That connects to https://bugs.ruby-lang.org/issues/18243. Proc#bind could address that but then make yet another surprise that the receiver is magically changed to nil in the receiving Ractor, so it doesn't seem OK for an abstraction to change the receiver of a Proc.

I feel building abstractions on top of Ractor is very difficult because there are so many restrictions. I'll close this as not planned because it seems impossible to "use Ractor as a back-end for most common constructs in the gem" because existing concurrent-ruby classes simply cannot use Ractor, that would be too incompatible.

It may be useful to develop new Ractor-specific abstractions (e.g. a Ractor pool or so). And if those are generic enough it may be useful to add them to this gem. Or maybe they better belong in another gem. In any case that's a different issue.

eregon commented 1 year ago

Also Ractor is still experimental and if you want true parallelism in Ruby which works with existing gems/code, then Threads on TruffleRuby or JRuby is the way to go.