interscript / interscript-ruby

Interoperable script conversion systems (ISCS) with the `interscript` gem
Other
11 stars 30 forks source link

Reversibility #722

Closed webdev778 closed 3 years ago

webdev778 commented 3 years ago

This is a work in progress, but comments are more than welcome

Throughout the codebase various nodes now support #reverse method. Those nodes, while constructing, also support a reverse_run switch which means, for 3 examples:

sub "a", "b", reverse_run: true

This rule will exist as: sub "b", "a", reverse_run: false when reversed and will only be run when the map has been reversed.

sub "a", "b", reverse_run: false

This rule will run only when un-reversed.

Normally, reverse_run is nil, which means it will run both in the reverse mode and in non-reverse mode.

Some examples already run in reverse mode:

$ REVERSE=true bundle exec rspec ./spec/interscript_spec.rb[1:1]
Finished in 24.93 seconds (files took 3.72 seconds to load)
7765 examples, 6391 failures

You can run a map in reverse this way: for instance, if map name is "alalc-ell-Grek-Latn-2010", you provide the map name either as: "alalc-ell-Latn-Grek-2010" or "alalc-ell-Grek-Latn-2010-reverse", for instance:

$ bin/interscript /dev/stdin -s alalc-ell-Latn-Grek-2010
Ellada
(Ctrl+D pressed)
Έλλαδα'
'
$

You can also preview how reversing works with console:

$ bin/console
[6] pry(main)> gl = Interscript.parse("bgnpcgn-deu-Latn-Latn-2000").stages[:main]
=> stage {
  sub "β"+capture(upper), "SS"+ref(1), before: any(nil)
  sub "Ä"+capture(upper), "AE"+ref(1), before: any(nil)
  sub "Ö"+capture(upper), "OE"+ref(1), before: any(nil)
  sub "Ü"+capture(upper), "UE"+ref(1), before: any(nil)
  parallel {
    sub "Ä", "Ae"
    sub "Ö", "Oe"
    sub "Ü", "Ue"
    sub "ä", "ae"
    sub "ö", "oe"
    sub "ü", "ue"
    sub "β", "ss"
  }
}
[7] pry(main)> gl.reverse
=> stage {
  parallel {
    sub "ss", "β"
    sub "ue", "ü"
    sub "oe", "ö"
    sub "ae", "ä"
    sub "Ue", "Ü"
    sub "Oe", "Ö"
    sub "Ae", "Ä"
  }
  sub "UE"+ref(1), "Ü"+capture(upper), reverse_before: any(nil)
  sub "OE"+ref(1), "Ö"+capture(upper), reverse_before: any(nil)
  sub "AE"+ref(1), "Ä"+capture(upper), reverse_before: any(nil)
  sub "SS"+ref(1), "β"+capture(upper), reverse_before: any(nil)
}
[8] pry(main)> 

reverse_before and friends are ignored in general, they only become before when reversed. Do note the bug: capturing and references aren't switched yet.

There's no documentation or tests, this pull request is mostly a request for comments.

webdev778 commented 3 years ago

Changes since last time:

sub 'b', 'X', before: 'a', after: 'c' after reversing becomes sub 'X', 'b', before: 'a', after: 'c', we got rid of reverse_before and friends. This gave us some passes. Old behavior can be reintroduced by doing two rules, one with reverse_run: true, one with false.

Rules using boundary-like aliases are now translated correctly:

[8] pry(main)> stage { sub line_start+"from", "to"; sub "from"+line_end, "to" }.reverse.stages[:main]
=> stage {
  sub "to"+line_end, "from"
  sub line_start+"to", "from"
}
[9] pry(main)> 

Rules using captures are now translated correctly:

[9] pry(main)> stage { sub capture(any("abc"))+"-"+capture(any("def")), ref(1)+ref(2) }.reverse.stages[:main]
=> stage {
  sub capture(any("abc"))+capture(any("def")), ref(1)+"-"+ref(2)
}
[10] pry(main)> 

Current state: Finished in 26.56 seconds (files took 3.96 seconds to load) 7765 examples, 6283 failures

webdev778 commented 3 years ago

Added a special case:

sub "", "X"

on "abcd" gives us "aXbXcXc". Therefore,

sub "a", ""

is from now reversed into:

sub "", "a", reverse_run: true

Therefore it's excluded from the run.

Current status:

Finished in 26.95 seconds (files took 3.92 seconds to load) 7765 examples, 6196 failures

ronaldtse commented 3 years ago

Thank you for this! The scheme looks reasonable.

In addition I’m thinking that this will help this task:interscript/arabic-diacritization#4.

Could you help finalize this and see if we can perform reverse conversion of the Arabic systems (we will have to run multiple systems to find out which conversion system was originally used).

ronaldtse commented 3 years ago

@webdev778 can you help finalise this soon? Thanks.