lojban / jbovlaste

http://jbovlaste.lojban.org
31 stars 11 forks source link

Encoding problem with accent marks #195

Open solpahi opened 9 years ago

solpahi commented 9 years ago

http://jbovlaste.lojban.org/dict/berl%C3%83%C6%92%C3%86%E2%80%99%C3%83%E2%80%9A%C3%82%C2%ACn

This was a test to see if accent marks are supported (for stress purposes in cmevla). It was recognized as a cmevla, but the encoding is all messed up.

lynn commented 1 year ago

This is still an issue. If you go to https://jbovlaste.lojban.org/dict/berlín and try to add the word, it gets increasingly krakozabra as you go through the steps (berlín, then berlín, etc).

rlpowell commented 1 year ago

I poked in a few places, but I don't actually have an accented valsi I want to add, so I haven't really tested it end-to-end.

Having said that, "berlín" is now recognized as nalvla, which isn't surprising to me as I don't think any of the parsers attempted to handle this.

The actual test that's being run here is vlatai.py from https://github.com/teleological/camxes-py , as far as I can tell.

I have not committed my current changes; let me know what you think of them.

rlpowell commented 1 year ago

Went ahead and checked in what I did, as it certainly doesn't make anything worse.