roedoejet / g2p

Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!
https://g2p-studio.herokuapp.com
Other
119 stars 26 forks source link

`g2p generate-mapping` generates incorrect configuration #354

Closed dhdaines closed 3 months ago

dhdaines commented 3 months ago

When running g2p generate-mapping for the first time, the filename in g2p/mappings/langs/generated/config-g2p.yaml is incorrect. For instance when I create g2p/mappings/langs/myh/config-g2p.yaml:

mappings:
  - display_name: Makah to IPA
    rules_path: myh_to_ipa.csv
    rule_ordering: apply-longest-first
    language_name: Makah
    in_lang: myh
    out_lang: myh-ipa
    case_sensitive: false
    authors:
      - David Huggins-Daines

And myh_to_ipa.csv (omitted, but you can see it in #355 ), when running g2p generate-mapping --ipa myh (after running g2p update, etc, sigh), I get this entry (trimmed slightly) in g2p/mappings/langs/generated/config-g2p.yaml:

  - display_name: myh-ipa IPA to eng-ipa IPA
    in_lang: myh-ipa
    language_name: myh-ipa
    out_lang: eng-ipa
    rules_path: myh_to_ipa.csv

But the actual file that it generated is called myh-ipa_to_eng-ipa.json. It's not even a CSV ;-)

joanise commented 3 months ago

This one is weird. I have use generate-mapping many times and gotten correct results, I wonder why it's different now. Must be a recently introduced bug!

joanise commented 3 months ago

Interesting, g2p generate-mapping --from myh --to eng works correctly, but g2p generate-mapping --ipa myh does not. I'd really like to strip --ipa out of that CLI, to be honest, but unless (or until) we do that it should work correctly.

dhdaines commented 3 months ago

I'd really like to strip --ipa out of that CLI, to be honest, but unless (or until) we do that it should work correctly.

Wow, yes, the CLI for generate-mapping is excessively confusing!

It is also pretty frustrating that you basically always have to run g2p update before and after running it, and yet it doesn't make any effort to warn you about this. Is there a good reason why it shouldn't just do this for you? (with an option to not run update, say, an option called --no-update)