hfst / hfst-ospell

HFST spell checker library and command line tool
Apache License 2.0
13 stars 9 forks source link

Endian-neutrality #26

Closed TinoDidriksen closed 7 years ago

TinoDidriksen commented 7 years ago

Commit 956237770545003e6c64b8c95b6393f8ec725783 fixes loading and using little-endian HFST-OL files on big-endian platforms, if they are not in a zhfst archive.

As in, on PPC hfst-ospell -S -m errmodel.default.hfst -l acceptor.default.hfst works, but hfst-ospell -S kal.zhfst does not.

I have inspected the first few bytes of the decompressed data, and it doesn't look anything like a HFST header. I got the hfst files by running PPC's own unzip in the zhfst, so why libarchive can't decompress it correctly in-memory is a task for another day. Hence this ticket.

(References https://github.com/hfst/hfst/issues/328)

Traubert commented 7 years ago

Does this mean that tmp-extraction does work? Should we perhaps try to always use that on big-endian platforms just to see if we can pass the Debian tests that way?