Closed opoudjis closed 3 years ago
Agree. I think this can be easily fixed with reading the file using UTF-8 and apply “normalize_unicode” to ensure the proper characters are normalized.
I added .force_encoding('UTF-8')
to string output from the parser (string literals and remarks) in #46, and updated tests, does it help?
@opoudjis can you help confirm the fix? Thanks!
I think @opoudjis is very far from this library, we probably need to merge and release so that he can verify it. Merging.
I've confirmed it. I am far from the library, but lutaml passes the text through, and it's no longer crashing when I restore the smart apostrophe in the Express source.
Interesting, it's not released yet :)
@zakjan I believe @opoudjis is using master 😉
Ok :)
I am assuming this is an issue with expressir, but from my distant vantage point in Metanorma, it is hard for me to tell.
The document in https://github.com/metanorma/annotated-express/blob/master/data/resources/action_schema/action_schema.exp is processed by expressir, and then has its parse passed on by lutaml to metanorma
Metanorma assumes all files it is processing are in UTF-8.
Lutaml, I am assured by @w00lf, processes all its files in UTF-8.
The action_schema.exp file contains the following remark line:
By the time this gets to Metanorma, it is:
i.e. This is a raw UTF-8 encoding of the smart apostrophe, but the file is being processed as 8-bit ASCII, not UTF-8, so Metanorma cannot read it:
expressir, as with the rest of our stack, must ensure that all files are processed as UTF-8, and that all output is in UTF-8.