ruby-rice / rice

Ruby Interface for C++ Extensions
http://ruby-rice.github.io/
Other
370 stars 62 forks source link

Header-only testing #149

Closed ankane closed 3 years ago

ankane commented 3 years ago

This tracks the testing status of #146 with existing projects.

Header-only docs

Each project uses the rice-header-only branch

Project Status Diff CI Notes
Midas Success :tada: Diff CI
LIBFFM Success :tada: Diff CI
IsoTree Success :tada: Diff CI
YouTokenToMe Success :tada: Diff CI
OutlierTree Success :tada: Diff CI
Faiss Success :tada: Diff CI
FastText Success :tada: Diff CI
DataSketches Success :tada: Diff CI
OR-Tools Success :tada: Diff CI
Torch.rb Success :tada: Diff CI
TorchAudio Success :tada: Diff CI
Tomoto In-Progress Diff CI Mac error due to C++17, runtime segfault on Windows

Migration notes

cfis commented 3 years ago

Ok, I think all the Rice issues for Faiss are now resolved. There are a couple failures in the tests, but they don't look Rice related to me.

cfis commented 3 years ago

Datasketches error was due to not having From_Ruby<unsigned char> which is now added (and I added signed char while at it).

ankane commented 3 years ago

Great, Faiss looking much better. Seeing bad any_cast runtime error on Ubuntu 18.04, but not on Ubuntu 20.04 or Mac.

Just ran a new build for DataSketches, but still seeing compile errors on all platforms.

Edit: DataSketches error is error: cannot convert ‘Rice::Array::Proxy’ to ‘VALUE’ {aka ‘long unsigned int’}

ankane commented 3 years ago

Hey @cfis, thanks for https://github.com/ankane/tomoto/issues/2. I converted Object to VALUE for convert in the other projects, and two more are now passing 🎉

fastText and DataSketches are failing on similar code, but with different error messages.

fastText - error: conversion from ‘tuple<fasttext::FastText&>’ to non-scalar type ‘tuple<fasttext::FastText>’ requested

    .define_method("dimension", &FastText::getDimension)
    .define_method("quantized?", &FastText::isQuant)
    .define_method("word_id", &FastText::getWordId)
    .define_method("subword_id", &FastText::getSubwordId)

DataSketches - invalid abstract return type

    .define_method("empty?", &theta_sketch::is_empty)

I imagine they could be converted to lambdas if needed, but wanted to see if this way of defining methods was still supported.

For tomoto, I applied the patch, which improved things significantly, but now running into another compile time error on Mac and runtime errors on Linux and Windows. Will spend some more time on it to see if I can figure it out.

cfis commented 3 years ago

@ankane - Thanks for all the feedback. On DataSketches, that was a bug. Fixed in b3c37d78886a5ac37a6e03c1c42443dee2e4dcec. Test run cleanly for me.

cfis commented 3 years ago

And fastText compiles for me now too. The tests run, but fail, all with the error "ArgumentError: test/support/unsupervised.txt cannot be opened for training!". I am assuming that is not Rice related, but if it is let me know.

ankane commented 3 years ago

Thanks @cfis, all tests now green for both of those. Working through the remaining projects now.

Is there a replacement for:

Rice::Data_Object<MPObjective>(x, rb_cMPVariable, nullptr, nullptr);

Rice 3 context

cfis commented 3 years ago

Just remove the two nullptr's. Those were for mark and free functions, but I updated the library to use TypedData_Struct - as opposed to deprecated Data_Struct which supposedly is going to be finally removed in Ruby 3.1 according to compiler warning messages. Thus mark and free are now setup in Data_Type, you don't have to pass them in to Data_Object.

ankane commented 3 years ago

Hmm, will that disable the mark and free functions? Seeing double free or corruption (out) when switching to that.

Also, unrelated, it looks like add_handler now returns Rice::Module instead of Rice::Class for code like this:

  Rice::define_class_under<torch::Device>(m, "Device")
    .add_handler<torch::Error>(handle_error)
    .define_constructor(Rice::Constructor<torch::Device, const std::string&>())

Error: ‘class Rice::Module’ has no member named ‘define_constructor’

cfis commented 3 years ago

No - mark and free are handled differently now due to changes in Ruby itself (look in Data_Type if interested).

Double deletes - is the library passing you a reference or pointer? If so, you have to tell Rice to not free it. See this:

https://github.com/jasonroelofs/rice/blob/master/README.md#ownership

Previously Rice used to always assume ownership, and I left it that way. But over the last few days I've changed my mind and I think it should be the opposite - Rice does not take ownership except for object that itself creates via calling a C++ constructor. See #150.

cfis commented 3 years ago

One other thing - if you don't implement To_Ruby correctly (meaning using perfect forwarding for references), you can take ownership when you don't mean too. I ran into that yesterday myself...thinking about what to do about it.

cfis commented 3 years ago

On the add_handler, what are you actually doing in that code? I'm curious to see a use case for that functionality.

ankane commented 3 years ago

Re add_handler: using it to customize the exception message (what_without_backtrace instead of what), but there's probably a better way to do that.

inline void handle_error(torch::Error const & ex)
{
  throw Rice::Exception(rb_eRuntimeError, ex.what_without_backtrace());
}

Re ownership: OR-Tools tests green with Return().takeOwnership(false)

ankane commented 3 years ago

Hey @cfis, think I narrowed down the issue with Torch.rb. Any idea why the first works but the second segfaults?

inline VALUE wrap(torch::Tensor x) {
  return Rice::detail::To_Ruby<torch::Tensor>::convert(x);
}

static VALUE torch_ones(int argc, VALUE* argv, VALUE self_)
{
  auto dispatch_ones = []() -> torch::Tensor {
    return torch::ones({2, 3});
  };

  // works
  return Rice::detail::To_Ruby<torch::Tensor>::convert(dispatch_ones());

  // segfault
  return wrap(dispatch_ones());
}

For Tomoto, everything works great on Linux, but half of the classes segfault on Windows (haven't been able to find a pattern between the ones that work and ones that don't). Will continue to dig, but let me know if you have any ideas about what would cause this type of behavior.

cfis commented 3 years ago

Hmm...of the top of my head not sure on either question. I assume the first one has something to do with lvalues/rvalues and something not getting copied correctly but of the top of me head I don't see it. Would have to walk through it with a debugger to check.

cfis commented 3 years ago

@ankane - See what happens with the updates I just pushed to master. Note they will break your code.

To fix remove Return().takeOwnership(false) since that is now the default and will no longer compile. Then deal with the opposite issue - methods where Ruby does need to take ownership. In that case do this (note I updated the documentation if you want to see a more fleshed out example):

define_function("method_that_returns_object_ruby_manages.", &some_method, Return().takeOwnership());

If you still run into issues let me know and I can take a look.

ankane commented 3 years ago

Great, have Torch.rb working locally, and will try the changes with OR-Tools. Can you regenerate rice.hpp when you have a chance?

Also, needed to wrap some convert calls with Object. Otherwise, ended up with integers. Example

Array a;
a.push(Object(Rice::detail::To_Ruby<torch::Tensor>::convert(t, true)));

Is it possible to push VALUEs directly? Noticed the same for functions that return VALUE.

cfis commented 3 years ago

Yeah, VALUEs are pita. Since they are simply a typdef to long long, there is no way to specialize templates for them and they get incorrectly converted. Using Object solves that, but at the cost of a an extra allocation for each conversion. My thinking was most of the time you won't be passing VALUE to C++ functions so although annoying easy enough to deal with since it won't occur much (note that SELF is specifically handled since that is the most common occurrence). Are you passing VALUEs around a lot?

However, it would be good to have a solution. My best idea so far is a way tell Rice to not convert these values. Maybe something like this:

VALUE workingWithValues(VALUE input)
{
  return input;
}

define_global_function("working_with_values", @workingWithValues, Arg("input").noConversion(), Return().noConversion());

-- or ---
define_global_function("working_with_values", @workingWithValues, Arg("input").isValue(), Return().isValue());
ankane commented 3 years ago

Ah, makes sense. Not using VALUEs a lot (mainly just in the example above), so not a big deal to wrap. The solution above looks clean (think I slightly prefer the 2nd one fwiw).

ankane commented 3 years ago

Also, seems like things are in a good spot with testing. Can run a final round of tests for all projects right before the header-only release.

cfis commented 3 years ago

Great. On tomoto - the segfaults on windows are with mingw64?

ankane commented 3 years ago

I believe so (setup-ruby uses RubyInstaller2 and the target is reported as x64-mingw32 - logs from latest run). It happens with about half the classes in the project that use inheritance.

cfis commented 3 years ago

Which branch are you using?

ankane commented 3 years ago

rice-header-only is most recent.

cfis commented 3 years ago

I can reproduce the Mingw crash. It in Eigen's mersenne twister code.

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007fff9fe4fe95 in Eigen::Rand::MersenneTwister<long long __vector(4), 312, 156, 31, 13043109905998158313ull, 29, 6148914691236517205ull, 17, 8202884508482404352ull, 37, 18444473444759240704ull, 43, 6364136223846793005ull>::operator()() (this=0x19c9ee36190)
    at D:/src/tomoto/vendor/EigenRand/EigenRand/PacketRandomEngine.h:326
326                                     res = pxor(res, psrl64(res, _Lx));
(gdb) bt
#0  0x00007fff9fe4fe95 in Eigen::Rand::MersenneTwister<long long __vector(4), 312, 156, 31, 13043109905998158313ull, 29, 6148914691236517205ull, 17, 8202884508482404352ull, 37, 18444473444759240704ull, 43, 6364136223846793005ull>::operator()() (this=0x19c9ee36190)
    at D:/src/tomoto/vendor/EigenRand/EigenRand/PacketRandomEngine.h:326

....

#10 tomoto::LDAModel<(tomoto::TermWeight)0, Eigen::Rand::ParallelRandomEngineAdaptor<unsigned int, Eigen::Rand::MersenneTwister<long long __vector(4), 312, 156, 31, 13043109905998158313ull, 29, 6148914691236517205ull, 17, 8202884508482404352ull, 37, 18444473444759240704ull, 43, 6364136223846793005ull>, 8>, 4ull, tomoto::IMGLDAModel, tomoto::MGLDAModel<(tomoto::TermWeight)0, Eigen::Rand::ParallelRandomEngineAdaptor<unsigned int, Eigen::Rand::MersenneTwister<long long __vector(4), 312, 156, 31, 13043109905998158313ull, 29, 6148914691236517205ull, 17, 8202884508482404352ull, 37, 18444473444759240704ull, 43, 6364136223846793005ull>, 8>, tomoto::IMGLDAModel, void, tomoto::DocumentMGLDA<(tomoto::TermWeight)0>, tomoto::ModelStateLDA<(tomoto::TermWeight)0> >, tomoto::DocumentMGLDA<(tomoto::TermWeight)0>, tomoto::ModelStateLDA<(tomoto::TermWeight)0> >::prepare(bool, unsigned long long, unsigned long long, unsigned long long) (this=0x19c9ee36180,
    initDocs=<optimized out>, minWordCnt=<optimized out>, minWordDf=<optimized out>,
--Type <RET> for more, q to quit, c to continue without paging--
    removeTopN=<optimized out>) at D:/src/tomoto/vendor/tomotopy/src/TopicModel/LDAModel.hpp:1055
#11 0x00007fff9fcdec27 in operator() (rmTop=<optimized out>, minDf=<optimized out>,
    minCnt=<optimized out>, self=..., __closure=<optimized out>) at lda.cpp:148
#12 std::__invoke_impl<void, init_lda(Rice::Module&)::<lambda(tomoto::ILDAModel&, size_t, size_t, size_t)>&, tomoto::ILDAModel&, long long unsigned int&, long long unsigned int&, long long unsigned int&> (__f=...)
    at C:/msys64/mingw64/include/c++/10.2.0/bits/invoke.h:60

My first thought is that this isn't a Rice bug, its a mingw64 issue with Eigen. If you look around https://gitlab.com/libeigen/eigen you'll see there seems to be issues with compiling with -O3, which is what is happening.

Did this work with a previous version of Rice? For what its worth, a MSVC 2019 build works fine on the same Windows machine.

ankane commented 3 years ago

Hmm, interesting. Yeah, works with Rice 3 with both C++11 and C++17 (https://github.com/ankane/tomoto/compare/windows - build logs next to each commit), so thought it may be related to the header-only changes.

cfis commented 3 years ago

Hmm, yeah that would indicate its due to the changes.

cfis commented 3 years ago

If I change the make file to use -Og then the crash goes away...

ankane commented 3 years ago

It looks like there are reports of similar behavior with mingw + Eigen (as you mentioned earlier), so maybe the changes are causing that to surface. I think it's fine to move forward without this if you think it's unlikely to affect other Rice projects.

Edit: From this issue, it may be fixed in Eigen 3.4 (not released yet).

cfis commented 3 years ago

@ankane - When you get a chance, do you mind trying with the latest version of Rice. Two main changes:

ankane commented 3 years ago

Tested a few projects. Midas, Faiss, and DataSketches work well.

For fastText, seeing:

/usr/include/c++/9/bits/stl_pair.h:449:51: error: no match for ‘operator==’ (operand types are ‘const fasttext::Vector’ and ‘const fasttext::Vector’)

when wrapping a method that returns std::vector<std::pair<std::string, Vector>> (both with and without a custom To_Ruby method).

ankane commented 3 years ago

Also, not sure this round of changes was great for Rice user experience. Needed to:

  1. Add classes in a specific order which wasn't previously required
  2. Add extra enums and classes which weren't previously needed
  3. Add an extra header since projects use std::string
cfis commented 3 years ago

On the operator ==, that is needed for implementing things like find and delete, etc. But the code checks for that. So will need to look into it.

On the other points:

  1. How much of a pain is that? I figure its better to find out sooner that you missed something than later. Really I'd like code to not compile at all, but I'm not sure if that's doable or not in C+17 (well probably is, I haven't figured out how yet). Anyway, that is how pybind works too.

  2. Not sure on this one, can you open tickets for those issues?

  3. Yeah, I can open a ticket for that. I'm torn on whether its better to have 1 or 2 headers. For what its worth, pybind11 requires two. Not sure what they do with std::string. We could move std::string back to the main header.

ankane commented 3 years ago

Thanks.

For 1, the code isn't really missing anything (all the classes exist and it previously worked with header-only Rice), but now requires users to manage the order they're defined (something they didn't have to worry about before).

For 2, if you have a method like define_function("some_method", [](SomeClass val) { ... }), you could previously use From_Ruby to convert a Ruby object to SomeClass without define_class/define_enum.

For 3, I think it's probably common enough to keep in the main header in either scenario.

cfis commented 3 years ago

For #2, actually you still can but you have to flag it as a built-in type:

template <>
  struct is_builtin<std::string> : public std::true_type {};

That means it is not a Data_Type<T> and should not be wrapped in a TypedData_Struct. What types are you running into that are like that? Or are they just value types of custom classes?

It would of course be much nicer to be able to do this at compile time and check Data_Type<T> and therefore not have to have is_builtin at all. But I haven't figured out how to do that (I'm not sure it is possible since you would need to set a value that is a contexpr which since it is const you can't do).

FYI - this may or may not have worked in previous versions of Rice. For example say you defined From_Ruby<std::complex<int>> but didn't specialialize 'std::complex&' or 'std::complex*'. Then if you got either of those, you'd get a runtime exception.

cfis commented 3 years ago

On fastdText, it is a std::vector of std::pair<std::string, fasttext::Vector>.

std::pair defines operator()==. But it then much check fasttext::Vector::operator()==.

Rice looks for the top level operator==() but doesn't recurse down. Not sure how to do that yet...need to research it.

ankane commented 3 years ago

This file has a good example for 2: https://github.com/ankane/faiss/blob/rice-header-only/ext/faiss/index.cpp

A constructor takes a faiss::MetricType argument, but I'd like for users to be able to pass in a symbol, so there's a From_Ruby conversion to convert the symbol into a faiss::MetricType.

With the latest changes, Rice::define_enum<faiss::MetricType> is needed, even though creating a Ruby class for the enum isn't desired.

Edit: will check out is_builtin.

cfis commented 3 years ago

Ok, I have fastText compiling again (note I didn't try to run it). Updates are pushed.

And thanks for the example code from faiss, will take a look.

ankane commented 3 years ago

Hey @cfis, was looking at the is_builtin code but having trouble seeing how to use it for the Faiss case.

Also, it looks like there's now a similar issue type definition issue with fastText at runtime: Type is not defined with Rice.

cfis commented 3 years ago

The error message should tell you what type. If not that's a bug.

Look at stl/string.ipp or stl/complex.ipp for example usage of is_builtin.

ankane commented 3 years ago

Thanks, have is_builtin working now.

Is there a way to add the name of the function (or other context) where the type isn't defined? For fastText, it wants me to register std::pair<fasttext::real, std::string> but from what I can tell, it doesn't appear to be used.

There are functions that return std::vector<std::pair<fasttext::real, std::string>>, but there are templates for is_builtin and To_Ruby for that type.

cfis commented 3 years ago

Hmm...will have to take a look. is_builtin should avoid that - what you are really saying is you are going to copy data back and forth (fyi, if you have a better name than is_builtin let me know).

Bigger picture, we should just add support for std::pair. Probably just make it a 2 element array....and tuples a x element array.

cfis commented 3 years ago

@ankane - pushed built in support for std::pair, so you can remove the custom mappings you have for it if you'd like. Note you'll need to make a few changes for that since the pairs were being mapped to a vector.

pair[0] -> pair.first pair[1] -> pair.second

And then pair.reverse would need to be [pair.second, pair.first]

I didn't make those changes, so some tests fail for me but most succeed.

ankane commented 3 years ago

Great, it's now working without without the extra std::pair mapping, but still not entirely sure why it was checking for that mapping when it shouldn't be used.

Also, it looks like the generated header needs updated on master. Would it be better to have a header that includes the others instead of a generated one?

cfis commented 3 years ago

There is a header like that - see rice/rice.hpp versus include/rice/rice.hpp (and same for stl.hpp). That is what is processed to generate the single file. It is also what I use for development to avoid constantly having to generate the single file.

The idea of a single file is make it easy to grab one file and put it in your project - you really don't need the gem at all. Of course that works a lot better once there is a stable release.

Its easy to hack if you want, update ruby/lib/mkmf-rice.rb to point to which header you want to use.

ankane commented 3 years ago

I think it'd probably be better for the gem to use the non-generated one, so when there are fixes on master, users can point their Gemfile to GitHub and everything works.

cfis commented 3 years ago

Hmm, maybe. But making that a global change would means the tests would use the master header file, and I think its better for them to use the combined file because its possible for the combined file to be broken while the master is not (which I've seen a couple times).

FYI, its easy to regenerate the files locally:

rake headers
jasonroelofs commented 3 years ago

@ankane If you've got some time, could you do one more round of testing? There's been some change to how to implement and use To_Ruby and From_Ruby as per https://github.com/jasonroelofs/rice/issues/160.

ankane commented 3 years ago

Yeah, definitely. It looks like there are currently compilation errors with Ruby < 3 (error: use of undeclared identifier 'rb_uint2num_inline'). Also, Ruby 2.5 is now EOL, so you may consider dropping support for it in 4.0.

jasonroelofs commented 3 years ago

Good point re: Ruby 2.5, we can shut that one down.