Closed mohamedhafez closed 3 days ago
Libxml2 doesn't support concurrent modifications the same document. See https://gitlab.gnome.org/GNOME/libxml2/-/wikis/Thread-safety
So the way Ractors work is that only one of them can access a given document object at a time, so libxml2's limitation of not supporting concurrent modifications on the same document actually shouldn't be an issue: https://ruby-doc.org/core-3.0.0/Ractor.html
What I'm hoping to avoid is that accessing different document objects can't be done concurrently, which is currently the case. According to the link you posted, libxml2 explicitly allows this as long as you:
configure the library accordingly using the --with-threads options
call xmlInitParser() in the "main" thread before using any of the libxml2 API (except possibly selecting a different memory allocator)
So I'm hoping this actually should be trivial!
(I'm addressing the use case of Ractors only here, since thats the only way it would happen in canonical, regular C-Ruby. Thread.new
and Fibers are still subject to the GVL in C-Ruby, and Ractors are the only way to do true concurrency. TruffleRuby and JRuby users already know to protect access to the same object with a Mutex if they are doing multithreaded programming, and if they don't they are going to be screwed in a million other places;)
@eregon perhaps you or someone on the TruffleRuby team could lend a little more gravitas to my argument above? 😅
@mohamedhafez Thanks for opening this issue. Earlier this year I spent some time exploring how ractors and the sqlite3 gem interact, so I have questions.
Have you tried parsing and manipulating documents in different ractors? What was your experience like? What worked and what didn't work?
Our mental model is that although libxml2 doesn't support concurrent operations within a single document, each ractor should be able to parse and manipulate a separate document, and I'd like to update our mental model if your experience has been something different.
When you say "support for ractors" I'm trying to understand your specific use case, and what specific error message motivated you to open this issue. Passing objects between ractors can be hard for complex object graphs, and so any additional information you can provide would help me form better mental models.
In planning ahead to the near future when TruffleRuby can run C-extensions marked with
rb_ext_ractor_safe(true)
in parallel, and for when Ractors are no longer just experimental, it would be great if the C-extension could be made threadsafe, or marked as such if it already is so!