interscript / rababa

Rababa, the diacritization library for Arabic and Hebrew (Abjad scripts in general)
12 stars 1 forks source link

Diacriticization failure after latest ONNX model change #29

Closed ronaldtse closed 3 years ago

ronaldtse commented 3 years ago

@gilgameshjw the tests started failing at commit e869dffbd1ea5d97dc9ec4630646e233bd69fdcf.

  1) Rababa::Diacritizer diacriticizes قطر
     Failure/Error: expect(diacritizer.diacritize_text(source)).to eq target

       expected: "قِطْرَ"
            got: "قَطُر"

       (compared using ==)
     # ./spec/rababa/diacritizer_spec.rb:43:in `block (3 levels) in <top (required)>'

  2) Rababa::Diacritizer diacriticizes abc
     Failure/Error: predicts = @onnx_session.run(nil, batch_data)

     OnnxRuntime::Error:
       Non-zero status code returned while running FusedConv node. Name:'fused Conv_92' Status Message: Invalid input shape: {0}
     # ./lib/rababa/diacritizer.rb:113:in `predict_batch'
     # ./lib/rababa/diacritizer.rb:68:in `diacritize_text'
     # ./spec/rababa/diacritizer_spec.rb:43:in `block (3 levels) in <top (required)>'

  3) Rababa::Diacritizer diacriticizes ‘Iz. Ibrāhīm as-Sa‘danī
     Failure/Error: predicts = @onnx_session.run(nil, batch_data)

     OnnxRuntime::Error:
       Non-zero status code returned while running FusedConv node. Name:'fused Conv_92' Status Message: Invalid input shape: {0}
     # ./lib/rababa/diacritizer.rb:113:in `predict_batch'
     # ./lib/rababa/diacritizer.rb:68:in `diacritize_text'
     # ./spec/rababa/diacritizer_spec.rb:43:in `block (3 levels) in <top (required)>'

Finished in 6.1 seconds (files took 2.67 seconds to load)
6 examples, 3 failures, 2 pending

Failed examples:

rspec ./spec/rababa/diacritizer_spec.rb[1:1] # Rababa::Diacritizer diacriticizes قطر
rspec ./spec/rababa/diacritizer_spec.rb[1:2] # Rababa::Diacritizer diacriticizes abc
rspec ./spec/rababa/diacritizer_spec.rb[1:3] # Rababa::Diacritizer diacriticizes ‘Iz. Ibrāhīm as-Sa‘danī
gilgameshjw commented 3 years ago

As mentionned in the blog and by Ahmad, this is now 1) correct: “قَطَر” (Qatar - a country), “قُطْر” (Qotr - circle diameter), “قَطْر” (Qatr - rain). 2) but fallinga onto collisions.

The expected value of the test is wrong and had to do with the fact that ONNX was not set up for dynamical lengths, which caused some issues. This has now be fixed and the code result is correct.

ronaldtse commented 3 years ago

@gilgameshjw the resulting string "قَطُر" does not seem to be any of the three? (notice there is a tail at the top circle in the middle character).

How about the tests 2 and 3? What does this mean?

Non-zero status code returned while running FusedConv node. Name:'fused Conv_92' Status Message: Invalid input shape: {0}
ronaldtse commented 3 years ago

Fix for tests 2 and 3 can be this: https://github.com/onnx/tensorflow-onnx/issues/1062

ronaldtse commented 3 years ago

I realized the problem with the tests that are not in Arabic:

       Non-zero status code returned while running FusedConv node. Name:'fused Conv_92' Status Message: Invalid input shape: {0}

This is when the string contains non-Arabic characters, the model does not have those characters as input and therefore does not know what to do with them. These non-Arabic characters are to be dealt with in the Reconciler, not in The Diacriticizer.

ronaldtse commented 3 years ago

I'm going to close this in favour of resolving #26 .