Compatibility problems with MRI when running HexaPDF testsuite

gettalong commented 6 years ago

I ran TruffleRuby (truffleruby 1.0.0-rc2, like ruby 2.4.4, GraalVM CE Native [x86_64-linux]) installed via RVM against the HexaPDF testsuite (cloning the repo, running gem install geom2d cmdparse and then rake test will run the testsuite) with the following result (omitting the errors):

Finished in 111.968567s, 16.5582 runs/s, 237.9418 assertions/s.
1854 runs, 26642 assertions, 31 failures, 88 errors, 0 skips

For comparison, the result of running the test suite against MRI 2.5 yields the following:

Finished in 2.174784s, 852.4986 runs/s, 13340.1776 assertions/s.
1854 runs, 29012 assertions, 0 failures, 0 errors, 0 skips

There several different encountered errors, for example:

TypeError: Coercion error: nil.to_str => String failed (seems to be a problem in String#<<)
TypeError: class or module required
TypeError: Coercion error: {:mode=>"rb"}.to_int => Integer failed
ArgumentError: expecting a Fixnum to sort

It also seems that refinements are not working correctly, as there are NoMethodError: undefined method 'y_min' for -10:Integer errors where a refinement is used (see https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/layout/numeric_refinements.rb and https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/layout/text_fragment.rb#L224).

I'm not quite sure how to further debug the problems... Just as an example, one of the errors produces the following backtrace:

HexaPDF::Serializer#test_0011_serializes hashes:
TypeError: Coercion error: nil.to_str => String failed
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:217:in `coerce_to_failed'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:217:in `execute_coerce_to'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:217:in `coerce_to'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:217:in `StringValue'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:217:in `concat_internal'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:217:in `<<'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:217:in `block in serialize_hash'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:215:in `each'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:215:in `serialize_hash'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:330:in `send'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:330:in `__serialize'
    /home/thomas/exthdsync/personal/repos/hexapdf/lib/hexapdf/serializer.rb:108:in `serialize'
    /home/thomas/exthdsync/personal/repos/hexapdf/test/hexapdf/test_serializer.rb:25:in `assert_serialized'
    /home/thomas/exthdsync/personal/repos/hexapdf/test/hexapdf/test_serializer.rb:89:in `test_0011_serializes hashes'

aardvark179 commented 6 years ago

I haven't bottomed out all of these issues yet, but I have made some progress on the ArgumentError: expecting a Fixnum to sort problem. In test_reference.rb the test

  it "is sortable w.r.t to other objects implementing #oid and #gen" do
    obj = Object.new
    obj.define_singleton_method(:oid) { 1 }
    obj.define_singleton_method(:gen) { 0 }
    assert_equal([obj, HexaPDF::Reference.new(1, 1), HexaPDF::Reference.new(5, 7)],
                 [HexaPDF::Reference.new(5, 7), HexaPDF::Reference.new(1, 1), obj].sort)
    assert_nil(HexaPDF::Reference.new(1, 0) <=> 5)
  end

is extremely sensitive to the comparisons made to sort the array. The test is essentially sorting an array [c,b,a] into [a,b,c]. In MRI this causes the following comparisons and swaps

b <=> a --> [c, a, b]
c <=> a --> [a, c, b]
c <=> b --> [a, b, c]

whereas we do

b <=> c --> [b, c, a]
a <=> b --> [a, c, b]
b <=> c --> [a, b, c]

I think the test could be made more robust by defining <=> for obj so that comparison is symmetrical, and we should document that we don't guarantee to perform the same comparisons as MRI during a sort operation. Do you know if there's any other code in the library that is likely to be sensitive to sort in a similar way? Should we consider adding an option to try and replicate MRI's precise semantics?

gettalong commented 6 years ago

Thanks for investigating! I see what you mean and you are right. The #<=> operator on obj is definitely missing there. Adding

obj.define_singleton_method(:<=>, &HexaPDF::Reference.method(:<=>))

should make this work correctly in all cases. I have pushed the changes to the Github repo.

I falsely relied on MRIs behaviour for exactly the given item configuration. Normally, it should have produced an error like the following:

2.5.0 :001 > [Object.new, 5].sort
Traceback (most recent call last):
        3: from /home/thomas/.rvm/rubies/ruby-2.5.0/bin/irb:11:in `<main>'
        2: from (irb):1
        1: from (irb):1:in `sort'
ArgumentError (comparison of Object with 5 failed)
2.5.0 :002 >

As for other similar cases depending on comparison order: No, I don't think so.

And I would not implement MRIs semantics. In fact, I think that TruffleRuby does better in this case since it caught a bug :smiley:

So, mea culpa, sorry!

gettalong commented 6 years ago

One more thing I forgot to mention: The HexaPDF::Importer::NullableWeakRef#test_0001_returns might fail but this is flaky test that I have still have to fix.

aardvark179 commented 6 years ago

Okay, the next one I've managed to pin down is definitely our problem. The serialization failures are due to us not correctly handling lru_cache[key] ||= value. If you do lru_cache[index] = value then the expression will return value no matter what the []= method returns. This matches MRI's behaviour. If we use ||= however then the expression returns either the result of [index] (if it is non-nil) or the result of []=. This does not match MRI.

This one must have been lurking unfound for ages. Luckily it looks like we only need a small change to the translator to fix it.

aardvark179 commented 6 years ago

I'm testing a fix for the []= issue in our CI now.

aardvark179 commented 6 years ago

So a little more progress. TypeError: Coercion error: {:mode=>"rb"}.to_int => Integer failed errors are due to a bug in our argument handling for File.read and I've found some other issues caused by the subclass of WeakRef. We redefined WeakRef.new which made it impossible to create instances of a subclass.

aardvark179 commented 6 years ago

Just a quick status update on this. I have found and fixed the following additional issues in TruffleRuby that this test suite has thrown up

Constant lookup on prepended modules was incorrect.
Calling refinements via Kernel#send or Symbol#to_proc failed.
We were missing the datadir config value.
Using Module#const_get with a scoped argument such as HexaPDF::Type::Catalog would not autoload the constant.
We didn't handle == and != for weak references or subclases of them

Fixes for all but the last three of those have been merged.

The remaining issues are

3 of the PageTreeNode tests fail, and I haven't debug why yet
Several tests in test_standard_security_handler.rb fail I think all the same reason.
HexaPDF::Font::TrueType::Table::Head::checksum_valid?#test_0001_checks whether an entry's checksum is valid: fails with a time overflow problem
HexaPDF::Type::Actions::URI::validation#test_0001_URI needs to be ASCII only: fails as we're not correctly handling the fallback function during encoding conversion.

gettalong commented 6 years ago

Thanks for all your work - is there anything I can do to help?

aardvark179 commented 6 years ago

I've found the cause of the security test failures. Strings that originally came from native memory were not creating a copy of that memory buffer. Doesn't normally matter, except when setbyte is used to mutate that buffer. This meant HexaPDF::Encryption#xor_key() was mutating the original key as well as the copy it had just made.

BTW. xor_key is likely to be quite slow on TruffleRuby because we normally use immutable ropes for strings, so setting a byte in the middle may be quite expensive. It might be worth getting the bytes from the string, xoring them, and then packing the result at the end.

aardvark179 commented 6 years ago

Okay, I've found the cause of the PageTreeNode test failures. Our Array#reject! implementation accidentally truncated the array if the block exited exceptionally.

I also saw some intermittent failures in the object stream tests. These turned out to be us evaluating an element in an array literal twice under some circumstances.

The only two remaining issues are

HexaPDF::Type::Actions::URI::validation#test_0001_URI needs to be ASCII only fails because don't yet support fallback arguments when encoding strings.
HexaPDF::Font::TrueType::Table::Head::checksum_valid?#test_0001_checks whether an entry's checksum is valid fails because we don't handle as wide a date range as MRI. That will be hard to fix as we'll need to move away from using java.time.

gettalong commented 6 years ago

I just installed truffleruby 1.0.0-rc5, like ruby 2.4.4, GraalVM CE Native [x86_64-linux] and tested again and it looks much better - thank you very much!

gettalong commented 6 years ago

Since RC6 is out today, I installed it and and ran the HexaPDF test suite, yielding only 6 problems :+1: :

One was the ArgumentError: expecting a Fixnum to sort which I thought I fixed but actually did it wrong. The real fix is using the following in test/hexapdf/test_reference.rb:
```
obj.define_singleton_method(:<=>) {|o| HexaPDF::Reference.new(oid, gen) <=> o }
```
The RuntimeError: Invalid value for Year is still happening. What I found, though, was that the test case used the Q> string pack command instead of the correct q> (unsigned vs signed) - and fixed it :smile:. However, this doesn't matter in this case because the result is the same.

I rewrote the test to make sure that the generated test dates don't overflow anymore.
Three tests for HexaPDF.data_dir fail for me because RbConfig::CONFIG["datadir"] returns nil. There is a datarootdir key but no datadir - is that intentional? I installed TruffleRuby via rvm.
The remaining failing test is HexaPDF::Type::Actions::URI::validation#test_0001_URI needs to be ASCII only which you explained why it fails.

Thank you very much for all your work!

eregon commented 3 years ago

I'm taking a look at hexapdf today. The test suite passes except for:

  1) Error:
HexaPDF::Type::Actions::URI::validation#test_0001_URI needs to be ASCII only:
Encoding::UndefinedConversionError: "\xC3" to UTF-8 in conversion from ASCII-8BIT to UTF-8 to US-ASCII
    <internal:core> core/string.rb:434:in `encode!'
    /home/eregon/code/hexapdf/lib/hexapdf/type/actions/uri.rb:60:in `perform_validation'
    /home/eregon/code/hexapdf/lib/hexapdf/object.rb:268:in `validate'
    /home/eregon/code/hexapdf/test/hexapdf/type/actions/test_uri.rb:19:in `test_0001_URI needs to be ASCII only'

  2) Failure:
HexaPDF::Layout::WidthFromPolygon#test_0008_works if there is no available space [/home/eregon/code/hexapdf/test/hexapdf/layout/test_width_from_polygon.rb:54]:
--- expected
+++ actual
@@ -1 +1 @@
-[0, 0]
+[0, 10]

And one test is particularly slow and seems worth profiling: HexaPDF::Filter::LZWDecode#test_works_with_big_data = 22.80 s = .

eregon commented 3 years ago

@aardvark179 Could you implement fallback: for String#encode/encode!? Minimal repro is: ruby -e 'p "hellö".b.encode!(Encoding::US_ASCII, fallback: lambda {|c| "%#{c.ord.to_s(16).upcase}" })'. I think maybe we can set the fallback object as the replacement in Encoding::Converter#initialize, but we should check what JRuby does as it implements it in 9.3.0.0.

eregon commented 3 years ago

The second error (Failure) is due to this:

p [[0, 0, :crossed_both], [-0.0, 10.0, :crossed_top], [10, 10, :crossed_both]].sort_by { |a| a[0] }'
# CRuby
[[0, 0, :crossed_both], [-0.0, 10.0, :crossed_top], [10, 10, :crossed_both]]
# TruffleRuby
[[-0.0, 10.0, :crossed_top], [0, 0, :crossed_both], [10, 10, :crossed_both]]

Or simplified:

$ ruby -e 'p [[0, :a], [-0.0, :b], [10, :c]].sort_by { |a| a[0] }'   
[[0, :a], [-0.0, :b], [10, :c]]
$ truffleruby -e 'p [[0, :a], [-0.0, :b], [10, :c]].sort_by { |a| a[0] }'   
[[-0.0, :b], [0, :a], [10, :c]]

So on CRuby: 0 <=> -0.0 = 0 and -0.0 <=> 0 = 0 and on TruffleRuby: 0 <=> -0.0 => 1 and -0.0 <=> 0 => -1. I think we can fix that, but it's unclear if it will actually solve it.

It seems sorting is not guaranteed to be stable in Ruby: Array#sort says:

The result is not guaranteed to be stable. When the comparison of two elements returns 0, the order of the elements is unpredictable.

@gettalong I think the code is a bit fragile here, should it maybe use sort! and not sort_by! to compare all array elements and not rely on potentially randomness for arrays with "equal as in <=> returns 0" elements?

eregon commented 3 years ago

I've got a fix for Float#<=> and it reliably returns the same result as CRuby, I guess Ruby's sort is stable at least when no element needs to move.

gettalong commented 3 years ago

@gettalong I think the code is a bit fragile here, should it maybe use sort! and not sort_by! to compare all array elements and not rely on potentially randomness for arrays with "equal as in <=> returns 0" elements?

@eregon I have looked at the code and you are right, it should better be just .sort! instead of .sort_by! {|a| a[0]}. The tests still pass and thinking it through it should also be the more correct way.

eregon commented 3 years ago

The fix for comparison with -0.0 is merged: https://github.com/oracle/truffleruby/commit/b5454b64b6aa113ef2c091f50a4e1211e5ecdcea So now the only failure remaining in the test suite is https://github.com/oracle/truffleruby/issues/1391#issuecomment-938566598

eregon commented 2 years ago

String#encode(..., fallback: ...) is implemented in https://github.com/oracle/truffleruby/commit/22f31a3b2892edec2870ee44d9a602eb4aa2e59b, so now the full test suite of hexapdf passes on TruffleRuby.

oracle / truffleruby

Compatibility problems with MRI when running HexaPDF testsuite #1391