roedoejet / g2p

Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!
https://g2p-studio.herokuapp.com
Other
132 stars 27 forks source link

Error in indicies when using subscripts #157

Closed joanise closed 2 years ago

joanise commented 2 years ago

converting abcmn though the mapping below produces incorrect indicies:

a,d
bc,e
g{1}h{2}i{3},G{2}H{1}I{3}J{1}
m{1}n{2},N{2}M{1}

To reproduce: git checkout dev.compose-bug (or git checkout 9d3268f03e) and run

cd g2p/tests
g2p convert --config  public/mappings/compose.yaml abcmn c2 c3 -d -e

current output:

[   'deNM',
    [['a', 'd'], ['b', 'e'], ['c', 'e'], ['c', 'M'], ['m', 'N']],
    [(0, 0), (1, 1), (2, 1), (2, 3), (3, 2)],
...

expected output:

[   'deNM',
    [['a', 'd'], ['b', 'e'], ['c', 'e'], ['m', 'M'], ['n', 'N']],
    [(0, 0), (1, 1), (2, 1), (3, 3), (4, 2)],
...

Note that converting just mn outputs correct indices, and so does converting xnm, the bug comes out when nm is preceded by bc.

joanise commented 2 years ago

Darn, #166 did not fix this issue...

roedoejet commented 2 years ago

does it still produce the same incorrect indices?

joanise commented 2 years ago

Yes.