Closed gettalong closed 2 years ago
I haven't bottomed out all of these issues yet, but I have made some progress on the ArgumentError: expecting a Fixnum to sort
problem. In test_reference.rb
the test
it "is sortable w.r.t to other objects implementing #oid and #gen" do
obj = Object.new
obj.define_singleton_method(:oid) { 1 }
obj.define_singleton_method(:gen) { 0 }
assert_equal([obj, HexaPDF::Reference.new(1, 1), HexaPDF::Reference.new(5, 7)],
[HexaPDF::Reference.new(5, 7), HexaPDF::Reference.new(1, 1), obj].sort)
assert_nil(HexaPDF::Reference.new(1, 0) <=> 5)
end
is extremely sensitive to the comparisons made to sort the array. The test is essentially sorting an array [c,b,a]
into [a,b,c]
. In MRI this causes the following comparisons and swaps
b <=> a --> [c, a, b]
c <=> a --> [a, c, b]
c <=> b --> [a, b, c]
whereas we do
b <=> c --> [b, c, a]
a <=> b --> [a, c, b]
b <=> c --> [a, b, c]
I think the test could be made more robust by defining <=>
for obj
so that comparison is symmetrical, and we should document that we don't guarantee to perform the same comparisons as MRI during a sort operation. Do you know if there's any other code in the library that is likely to be sensitive to sort
in a similar way? Should we consider adding an option to try and replicate MRI's precise semantics?
Thanks for investigating! I see what you mean and you are right. The #<=>
operator on obj
is definitely missing there. Adding
obj.define_singleton_method(:<=>, &HexaPDF::Reference.method(:<=>))
should make this work correctly in all cases. I have pushed the changes to the Github repo.
I falsely relied on MRIs behaviour for exactly the given item configuration. Normally, it should have produced an error like the following:
2.5.0 :001 > [Object.new, 5].sort
Traceback (most recent call last):
3: from /home/thomas/.rvm/rubies/ruby-2.5.0/bin/irb:11:in `<main>'
2: from (irb):1
1: from (irb):1:in `sort'
ArgumentError (comparison of Object with 5 failed)
2.5.0 :002 >
As for other similar cases depending on comparison order: No, I don't think so.
And I would not implement MRIs semantics. In fact, I think that TruffleRuby does better in this case since it caught a bug :smiley:
So, mea culpa, sorry!
One more thing I forgot to mention: The HexaPDF::Importer::NullableWeakRef#test_0001_returns
might fail but this is flaky test that I have still have to fix.
Okay, the next one I've managed to pin down is definitely our problem. The serialization failures are due to us not correctly handling lru_cache[key] ||= value
. If you do lru_cache[index] = value
then the expression will return value
no matter what the []=
method returns. This matches MRI's behaviour. If we use ||=
however then the expression returns either the result of [index]
(if it is non-nil) or the result of []=
. This does not match MRI.
This one must have been lurking unfound for ages. Luckily it looks like we only need a small change to the translator to fix it.
I'm testing a fix for the []=
issue in our CI now.
So a little more progress. TypeError: Coercion error: {:mode=>"rb"}.to_int => Integer failed
errors are due to a bug in our argument handling for File.read
and I've found some other issues caused by the subclass of WeakRef
. We redefined WeakRef.new
which made it impossible to create instances of a subclass.
Just a quick status update on this. I have found and fixed the following additional issues in TruffleRuby that this test suite has thrown up
Kernel#send
or Symbol#to_proc
failed.datadir
config value.Module#const_get
with a scoped argument such as HexaPDF::Type::Catalog
would not autoload the constant.==
and !=
for weak references or subclases of themFixes for all but the last three of those have been merged.
The remaining issues are
test_standard_security_handler.rb
fail I think all the same reason.HexaPDF::Font::TrueType::Table::Head::checksum_valid?#test_0001_checks whether an entry's checksum is valid:
fails with a time overflow problemHexaPDF::Type::Actions::URI::validation#test_0001_URI needs to be ASCII only:
fails as we're not correctly handling the fallback function during encoding conversion.Thanks for all your work - is there anything I can do to help?
I've found the cause of the security test failures. Strings that originally came from native memory were not creating a copy of that memory buffer. Doesn't normally matter, except when setbyte
is used to mutate that buffer. This meant HexaPDF::Encryption#xor_key()
was mutating the original key as well as the copy it had just made.
BTW. xor_key
is likely to be quite slow on TruffleRuby because we normally use immutable ropes for strings, so setting a byte in the middle may be quite expensive. It might be worth getting the bytes from the string, xoring them, and then packing the result at the end.
Okay, I've found the cause of the PageTreeNode
test failures. Our Array#reject!
implementation accidentally truncated the array if the block exited exceptionally.
I also saw some intermittent failures in the object stream tests. These turned out to be us evaluating an element in an array literal twice under some circumstances.
The only two remaining issues are
HexaPDF::Type::Actions::URI::validation#test_0001_URI needs to be ASCII only
fails because don't yet support fallback
arguments when encoding strings.HexaPDF::Font::TrueType::Table::Head::checksum_valid?#test_0001_checks whether an entry's checksum is valid
fails because we don't handle as wide a date range as MRI. That will be hard to fix as we'll need to move away from using java.time
.I just installed truffleruby 1.0.0-rc5, like ruby 2.4.4, GraalVM CE Native [x86_64-linux]
and tested again and it looks much better - thank you very much!
Since RC6 is out today, I installed it and and ran the HexaPDF test suite, yielding only 6 problems :+1: :
One was the ArgumentError: expecting a Fixnum to sort
which I thought I fixed but actually did it wrong. The real fix is using the following in test/hexapdf/test_reference.rb
:
obj.define_singleton_method(:<=>) {|o| HexaPDF::Reference.new(oid, gen) <=> o }
The RuntimeError: Invalid value for Year
is still happening. What I found, though, was that the test case used the Q>
string pack command instead of the correct q>
(unsigned vs signed) - and fixed it :smile:. However, this doesn't matter in this case because the result is the same.
I rewrote the test to make sure that the generated test dates don't overflow anymore.
Three tests for HexaPDF.data_dir
fail for me because RbConfig::CONFIG["datadir"]
returns nil
. There is a datarootdir
key but no datadir
- is that intentional? I installed TruffleRuby via rvm.
The remaining failing test is HexaPDF::Type::Actions::URI::validation#test_0001_URI needs to be ASCII only
which you explained why it fails.
Thank you very much for all your work!
I'm taking a look at hexapdf today. The test suite passes except for:
1) Error:
HexaPDF::Type::Actions::URI::validation#test_0001_URI needs to be ASCII only:
Encoding::UndefinedConversionError: "\xC3" to UTF-8 in conversion from ASCII-8BIT to UTF-8 to US-ASCII
<internal:core> core/string.rb:434:in `encode!'
/home/eregon/code/hexapdf/lib/hexapdf/type/actions/uri.rb:60:in `perform_validation'
/home/eregon/code/hexapdf/lib/hexapdf/object.rb:268:in `validate'
/home/eregon/code/hexapdf/test/hexapdf/type/actions/test_uri.rb:19:in `test_0001_URI needs to be ASCII only'
2) Failure:
HexaPDF::Layout::WidthFromPolygon#test_0008_works if there is no available space [/home/eregon/code/hexapdf/test/hexapdf/layout/test_width_from_polygon.rb:54]:
--- expected
+++ actual
@@ -1 +1 @@
-[0, 0]
+[0, 10]
And one test is particularly slow and seems worth profiling:
HexaPDF::Filter::LZWDecode#test_works_with_big_data = 22.80 s = .
@aardvark179 Could you implement fallback:
for String#encode/encode!
?
Minimal repro is: ruby -e 'p "hellö".b.encode!(Encoding::US_ASCII, fallback: lambda {|c| "%#{c.ord.to_s(16).upcase}" })'
.
I think maybe we can set the fallback object as the replacement
in Encoding::Converter#initialize
, but we should check what JRuby does as it implements it in 9.3.0.0.
The second error (Failure) is due to this:
p [[0, 0, :crossed_both], [-0.0, 10.0, :crossed_top], [10, 10, :crossed_both]].sort_by { |a| a[0] }'
# CRuby
[[0, 0, :crossed_both], [-0.0, 10.0, :crossed_top], [10, 10, :crossed_both]]
# TruffleRuby
[[-0.0, 10.0, :crossed_top], [0, 0, :crossed_both], [10, 10, :crossed_both]]
Or simplified:
$ ruby -e 'p [[0, :a], [-0.0, :b], [10, :c]].sort_by { |a| a[0] }'
[[0, :a], [-0.0, :b], [10, :c]]
$ truffleruby -e 'p [[0, :a], [-0.0, :b], [10, :c]].sort_by { |a| a[0] }'
[[-0.0, :b], [0, :a], [10, :c]]
So on CRuby: 0 <=> -0.0 = 0
and -0.0 <=> 0 = 0
and on TruffleRuby: 0 <=> -0.0 => 1
and -0.0 <=> 0 => -1
.
I think we can fix that, but it's unclear if it will actually solve it.
It seems sorting is not guaranteed to be stable in Ruby: Array#sort says:
The result is not guaranteed to be stable. When the comparison of two elements returns 0, the order of the elements is unpredictable.
@gettalong I think the code is a bit fragile here, should it maybe use sort!
and not sort_by!
to compare all array elements and not rely on potentially randomness for arrays with "equal as in <=> returns 0" elements?
I've got a fix for Float#<=>
and it reliably returns the same result as CRuby, I guess Ruby's sort is stable at least when no element needs to move.
@gettalong I think the code is a bit fragile here, should it maybe use sort! and not sort_by! to compare all array elements and not rely on potentially randomness for arrays with "equal as in <=> returns 0" elements?
@eregon I have looked at the code and you are right, it should better be just .sort!
instead of .sort_by! {|a| a[0]}
. The tests still pass and thinking it through it should also be the more correct way.
The fix for comparison with -0.0
is merged: https://github.com/oracle/truffleruby/commit/b5454b64b6aa113ef2c091f50a4e1211e5ecdcea
So now the only failure remaining in the test suite is https://github.com/oracle/truffleruby/issues/1391#issuecomment-938566598
String#encode(..., fallback: ...)
is implemented in https://github.com/oracle/truffleruby/commit/22f31a3b2892edec2870ee44d9a602eb4aa2e59b, so now the full test suite of hexapdf passes on TruffleRuby.
I ran TruffleRuby (
truffleruby 1.0.0-rc2, like ruby 2.4.4, GraalVM CE Native [x86_64-linux]
) installed via RVM against the HexaPDF testsuite (cloning the repo, runninggem install geom2d cmdparse
and thenrake test
will run the testsuite) with the following result (omitting the errors):For comparison, the result of running the test suite against MRI 2.5 yields the following:
There several different encountered errors, for example:
TypeError: Coercion error: nil.to_str => String failed
(seems to be a problem inString#<<
)TypeError: class or module required
TypeError: Coercion error: {:mode=>"rb"}.to_int => Integer failed
ArgumentError: expecting a Fixnum to sort
It also seems that refinements are not working correctly, as there are
NoMethodError: undefined method 'y_min' for -10:Integer
errors where a refinement is used (see https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/layout/numeric_refinements.rb and https://github.com/gettalong/hexapdf/blob/master/lib/hexapdf/layout/text_fragment.rb#L224).I'm not quite sure how to further debug the problems... Just as an example, one of the errors produces the following backtrace: