jekyll / classifier-reborn

A general classifier module to allow Bayesian and other types of classifications. A fork of cardmagic/classifier.
https://jekyll.github.io/classifier-reborn/
GNU Lesser General Public License v2.1
548 stars 109 forks source link

Test Native and GSL Implementations #196

Closed mkasberg closed 2 years ago

mkasberg commented 2 years ago

classifier-reborn is designed to work with or without GSL support.

https://github.com/jekyll/classifier-reborn/blob/99d13af5adf040ba40a6fe77dbe0b28756562fcc/docs/index.md?plain=1#L68

If GSL is installed, classifier-reborn will detect it and use it. If GSL is not installed, classifier-reborn will fall back to a pure-ruby implementation. The mechanism for doing so is in lsi.rb:

https://github.com/jekyll/classifier-reborn/blob/99d13af5adf040ba40a6fe77dbe0b28756562fcc/lib/classifier-reborn/lsi.rb#L7-L17

I think it's important to test the classifier-reborn gem with GSL support in CI. One of my goals is to add similar support using Numo, and I'd like that to be tested in CI as well. Also, I want to make sure I don't do anything along the way that could break existing GSL support. As far as I can tell, GSL support was never tested in CI before now (though it was previously discussed here).

In this PR, I'm expanding our text matrix to test with and without GSL enabled. When the matrix has GSL enabled, we install the gsl gem (which is not included in our Gemfile since it's optional). Lucky for us, GitHub already includes libgsl as pre-installed software, so we don't need to do anything special for the apt package. Our gem already has everything needed to build & install. When matrix.gsl is false, we won't install the gem and tests will run the native Ruby implementation. When matrix.gsl is true, we'll install the gem tests will run the GSL implementation.

Currently, GSL only works with Ruby 2.7. (One of the main reasons I want to add support for Numo is because GSL is becoming difficult to support.) As such, I've excluded other versions of ruby in our test matrix for now. They'll still be tested with GSL disabled, but not with it enabled.

While working on this, I noticed some tests in the LSI spec that return early when $GSL is not enabled. It would be better for those tests to report as skipped when GSL is not enabled (and this matches the pattern of the redis tests, that report as skipped if redis isn't available).

mattr- commented 2 years ago

@jekyllbot: merge +dev

mattr- commented 2 years ago

Thanks! This is a great improvement!

ashmaroli commented 2 years ago

@mkasberg I think there is some misunderstanding here.

The lone job that runs with gsl: true installs the gsl gem in a separate step using gem install ... However, the test suite is consequently run with bundle exec .., meaning the test-suite is run in the context of pre-existing Gemfile.lock (that does not include `gsl gem).

Theoretically, my argument can be proven / disproved by having the test-helper issue an appropriate log output before running the tests.

To remove the ambiguity nevertheless, the best route is to have Bundler install the gsl gem during bundle install by using an environment variable:


# Gemfile

gem 'gsl' if ENV["LOAD_GSL"]

And then initialize the environmental variable before setting up Ruby:

# workflow config

jobs:
  ci:
    env:
      # See https://github.com/marketplace/actions/setup-ruby-jruby-and-truffleruby#matrix-of-gemfiles
      BUNDLE_GEMFILE: ${{ matrix.gemfile }}
      LOAD_GSL: ${{ matrix.gsl }}
mkasberg commented 2 years ago

You're right! These two tests should not be skipped when GSL is actually loaded:

image

Thanks for the suggestion, I think I can implement something like that and I'll make a PR to do so! Might take me about a week to get to it, I'm going on vacation this week.