trilogy-libraries / trilogy

Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.
MIT License
704 stars 69 forks source link

Non-C extension support #193

Open headius opened 4 months ago

headius commented 4 months ago

I would like to be able to support this library in JRuby, but the C extension is a limiting factor. JRuby does not support the CRuby extension API.

We do, however, support some alternative mechanisms for binding native libraries:

Some combination of these APIs should get us there!

mohamedhafez commented 3 months ago

I'd be willing to put a bounty of a couple thousand dollars on this, if the trilogy team agrees to allow work on this...

mohamedhafez commented 2 months ago

@jhawthorn any chance we could get a comment on whether the trilogy team would accept work on this? In conversation with @headius he's mentioned it could be done in a way that doesn't at all change how trilogy works on regular MRI Ruby with the C-extension, and would just make it available to JRuby users like myself, which would be a great help

matthewd commented 2 months ago

I guess I'm not super clear what advantage this library offers for JRuby users... I would've thought one would prefer to use a native JDBC approach or something along those lines?

I think that any FFI-like interface would eliminate the performance advantages that come from being very aware of where & when any allocations & memory copies have to occur, and so would certainly not be a viable option for the CRuby implementation. At that point, it feels like we'd essentially be talking about a completely separate (JRuby-specific) gem that uses the Trilogy C library?

mohamedhafez commented 2 months ago

Regarding the benefit to JRuby users: the native JDBC integration with rails (ar-jdbc) isn't trivial to maintain, and needs lots of work every time a new Rails minor version is released, and unfortunately because of this frequently lags behind the latest Rails releases (Rails 7.0 is the latest release supported right now for example). In addition, sometimes there are issues, or just differences in configuration and database setup, that can crop up unexpectedly, and it can be a headache to figure out what's going wrong and how to work around it (e.g. https://github.com/jruby/activerecord-jdbc-adapter/issues/897, https://github.com/jruby/activerecord-jdbc-adapter/issues/1091). It would be nice have something that was closer to an exact replacement of what C-Ruby users are using so we'd have the option of never having to worry about that kind of thing happening again, even if it came at a bit of performance cost. (I'm currently possibly dealing with an issue right now, and would love to have this option)

In short, full compatibility that we'd get automatically the minute a new Rails version is released, with much, much less room for error, is why I personally would want this. Man I've never been great at summarizing maybe i should have just left it at that in the first place 😅

As far as whether we are essentially talking about a JRuby-specific gem that uses the Trilogy C library, and whether FFI would result in a performance penalty in the CRuby gem even with YJIT possibly being able to optimize better... I don't have the experience to speak on that unfortunately, but I bet @headius could chime in on that!

headius commented 1 month ago

Happy to jump in here with some additional info.

First off, the main reason why it would help us if Trilogy supported JRuby is as @mohamedhafez mentioned: we'd use exactly the same code from the driver level up, allowing us to exactly trace Rails as new releases come out. Currently, we have to do a bit of work each release to update the JDBC-based adapter for each new Rails release.

Second, the FFI version may introduce more overhead, but it is the most efficient way for this driver to support Ruby implementations other than CRuby. JRuby does not implement the C extension API, and TruffleRuby implements it but with a high-overhead interface that is probably slower than FFI. Both implementations could use an FFI version.

Third, I don't believe it requires a completely separate gem. The interface to the C code appears to be rather compact and easy to abstract for both the C extension and the FFI version. Above that, the code would be the same for all implementations.

My assumptions about the design of this project may be flawed, but I did not see anything in the extension interface that would be difficult to implement via an FFI version.

headius commented 3 weeks ago

Did a bit more exploration here and I think there's three options going forward (and a fourth kinda crazy one).

Option #1: Build a JNI (Java Native Interface) extension

The most direct analog here would be to write a JNI extension for the JVM that wraps the Trilogy library in the same way as the CRuby extension. The Trilogy functions are well-isolated from the internals of CRuby, and we could mimic the same structure in the JNI extension.

We would then build some shim code to call that JNI extension from JRuby and use most of the remaining Ruby code as-is. There could be very little or almost no JRuby-specific Java code if we just used JRuby's Java integration.

Benefits:

Issues:

Option #2: Use OpenJDK Panama

In this scenario, we would use the Panama jextract tool to generate a Java FFM (Foreign Function and Memory) wrapper around Trilogy as a dynamic library. jextract would create Java APIs and classes to represent the necessary functions and structs, and ideally have much lower overhead than a hand-build JNI extension.

I just ran an experiment using jextract and the Trilogy header files. It appears that all of Trilogy's primary functions and structures generated ok, but the header files include a lot of internal constants, structures, and functions that are probably not needed in the public API. In addition, because the Trilogy header pulls in OpenSSL, jextract also attempted to generate bindings for the entire OpenSSL library. We would want to limit the generation to just the necessary Trilogy functions, but I'm not sure what's included in that list.

Benefits

Risks

Option #3: Ruby FFI bindings

This option would use the existing Ruby FFI support to bind all appropriate Trilogy functions. The remaining logic would live in Ruby to adapt those functions to the rest of the Trilogy Ruby code in the same way as the CRuby extension.

Benefits

Issues

Bonus crazy option #4: Compile Trilogy to WASM and run it with a JVM WASM runtime

This sounds crazy, but it's actually a fallback option for JRuby's use of the Prism parser. We compile the Trilogy library to WASM, and then run that through the Chicory JVM WASM runtime. If that runtime can optimize the resulting code down to JVM bytecode, and the JVM can optimize that bytecode to native, it might have good enough performance to be usable without requiring any native library whatsoever. We would then just wrap that WASM version of Trilogy with appropriate Java and Ruby code to adapt it to the rest of the Trilogy Ruby code.

Benefits

Issues

Bottom Line

If Trilogy is as fast as claimed, then it could be a useful library to much more than just the Ruby ecosystem, and it's worth exploring making it a more general-purpose MySQL library. That would mean formalizing the public API and also producing a dynamic library rather than just the static library currently used by the CRuby extension.

If we agree that it could be useful outside of Ruby, then it's also worth exploring ways to bind it for the JVM. This would enable all entities in the JVM ecosystem to take advantage of it, and more importantly it would make it easier for Ruby implementations on JVM like JRuby to keep up with ActiveRecord MySQL support in new versions of Rails (requiring just a bit of binding work when Trilogy is updated, similar effort to maintaining the CRuby extension).

Even if it is not intended to be used outside of Ruby, providing bindings for JRuby would ease the upgrade path for JRuby's ActiveRecord support compared to the JDBC adaptation we do today.

I'm available to chat about this any time. If the work were to go forward, I believe the JRuby team could commit some resources to it, at least to build a proof-of-concept binding we can use to evaluate the overall idea.

headius commented 3 weeks ago

I should also add here that part of the risk here is that the current JRuby ActiveRecord-JDBC binding for MySQL is not all that bad to maintain. The code diff from Rails is the second-smallest (of the three core databases) and we usually can turn around updated Rails support fairly quickly. The benefit of using Trilogy directly would be that almost no work is required when a new ActiveRecord comes out, and it would "just work" as long as Trilogy didn't need additional changes (which would affect CRuby in the same way).

bhelx commented 2 weeks ago

Happy to support anyone who wants to try the Chicory Wasm implementation. We're close to stabilizing our API and have our compiler working. I can't step-by-step say how all this will work without spending some more time on it, but I think it will be faster than you think and the whole plan doesn't sound unreasonable to me.

headius commented 2 weeks ago

@bhelx We would love to give it a try! I think the two most interesting options are going to be Panama and Chicory. I would like to move forward with a Panama proof of concept once we confirm that OpenSSL is not exposed to consumers of the Trilogy API. The Chicory version could be attempted any time, but I wouldn't have the cycles for it for a couple of weeks at least.

And both of these options require Trilogy to have a dynamic library build target and a list of public API functions to expose.

headius commented 2 weeks ago

@matthewd Maybe you can answer some of the questions in this thread?