hfst / hfst-ospell

HFST spell checker library and command line tool
Apache License 2.0
13 stars 9 forks source link

hfst-ospell -v does not list correct metadata #32

Closed albbas closed 5 years ago

albbas commented 7 years ago

OS: KDE neon User Edition 5.10, based on Ubuntu 16.04

Installed packages:

hfst-ospell  0.4.5~r344-0ubuntu1~xenial1
giella-sme  0.0.20150917~r156539-1~sid1

For some reason the executable installed in the hfst-ospell package does not display the correct info about the loaded speller package.

Executable from the package:

/usr/bin/hfst-ospell -v /usr/share/voikko/3/se.zhfst
Following metadata was read from ZHFST archive:
locale: und
version:  [vcsrev: ]
date:
producer: [email: <>, website: <>]

From self built hfst-ospell

Output from the shell script in the hfst-ospell directory:

hfst-ospell $ ./hfst-ospell -v /usr/share/voikko/3/se.zhfst
Following metadata was read from ZHFST archive:
locale: se
version: GT_VERSION [vcsrev: GT_REVISION]
date: DATE
producer: Giellatekno/Divvun/UiT contributors[email: <feedback@divvun.no>, website: <http://divvun.no>]
title [fi]: Pohjoissaamen oikoluku
title [nb]: Nordsamisk stavekontroll
title [se]: Davvisámi čállindárkisteaddji
title [sma]: Noerhtesaemien staeriedimmiedïrregh
title [smj]: Nuorttasáme duollatjállemdárkastus
title [sv]: Nordsamisk rättstavning
description [se]: This is an fst-based speller for Northern Sámi made by
    Divvun/Giellatekno/UiT. It is based
    on the normative subset of the morphological analyser for Northern Sámi.
    The source code can be found at:
    https://victorio.uit.no/langtech/trunk/langs/sme/
    License: GPL3+.
acceptor[default.] [id: acceptor.default.hfst, type: generaltrtype: ]
title [se]: Giellatekno/Divvun/UiT dictionary Northern Sámi
description[se]: Giellatekno/Divvun/UiT dictionary for
    Northern Sámi compiled for HFST.
errmodel[default.] [id: errmodel.default.hfst]
title [se]: Levenshtein edit distance transducer
description[se]: Correction model for keyboard misstrokes, at most 2 per
    word.
type: default
model: errormodel.default.hfst

Output from the executable found in .libs:

hfst-ospell $ .libs/hfst-ospell -v /usr/share/voikko/3/se.zhfst
Following metadata was read from ZHFST archive:
locale: und
version:  [vcsrev: ]
date:
producer: [email: <>, website: <>]
albbas commented 7 years ago

After I reported this behaviour, I installed libtinyxml2, to check whether hfst-ospell would work better with that library than the default libxml++

441 sudo apt install libtinyxml-dev 447 sudo apt install libtinyxml2-dev 448 ./configure --with-tinyxml2 --without-libxmlpp 454 make -j

and now .libs/hfst_ospell shows the correct metadata.

But, now hfst-ospell from the package also shows the correct metadata

hfst-ospell $ /usr/bin/hfst-ospell -v /usr/share/voikko/3/se.zhfst Following metadata was read from ZHFST archive: locale: se version: GT_VERSION [vcsrev: GT_REVISION] date: DATE producer: Giellatekno/Divvun/UiT contributors[email: feedback@divvun.no, website: http://divvun.no] title [fi]: Pohjoissaamen oikoluku title [nb]: Nordsamisk stavekontroll title [se]: Davvisámi čállindárkisteaddji title [sma]: Noerhtesaemien staeriedimmiedïrregh title [smj]: Nuorttasáme duollatjállemdárkastus title [sv]: Nordsamisk rättstavning description [se]: This is an fst-based speller for Northern Sámi made by Divvun/Giellatekno/UiT. It is based on the normative subset of the morphological analyser for Northern Sámi. The source code can be found at: https://victorio.uit.no/langtech/trunk/langs/sme/ License: GPL3+. acceptor[default.] [id: acceptor.default.hfst, type: generaltrtype: ] title [se]: Giellatekno/Divvun/UiT dictionary Northern Sámi description[se]: Giellatekno/Divvun/UiT dictionary for Northern Sámi compiled for HFST. errmodel[default.] [id: errmodel.default.hfst] title [se]: Levenshtein edit distance transducer description[se]: Correction model for keyboard misstrokes, at most 2 per word. type: default model: errormodel.default.hfst

I tried uninstalling and purging hfst-ospell and install it again, and it still shows the correct metadata.

albbas commented 7 years ago

And voikkospell also reports the correct info now: voikkospell -l Unknown file in archive ./._acceptor.default.hfst Unknown file in archive ./._errmodel.default.hfst Unknown file in archive ./._index.xml se-x-standard: Giellatekno/Divvun/UiT fst-based speller for Northern Sami

Voikkospell reported about und before I began looking at the above issue …

TinoDidriksen commented 7 years ago

The Debian/Ubuntu packages are built entirely without XML support, and thus cannot display any such information. I disabled XML because of https://github.com/hfst/hfst-ospell/issues/21 and https://github.com/hfst/hfst-ospell/issues/22 - there is no supported XML library that'll work on all supported platforms.

And the XML didn't seem meaningful, or it would be a runtime requirement - the zhfst files work perfectly file without the XML.

If the XML is meaningful, I guess I'll have to write a tiny fallback parser to extract at least the ISO 639 codes.

albbas commented 7 years ago

Of course I had a locally built voikkospell in /usr/local/bin. Using voikkospell provided by the package, I get this report now:

/usr/bin/voikkospell -l fi-x-standard: suomi (perussanasto) se-x-standard: Davvisámi čállindárkisteaddji sma-x-standard: Åarjelsaemien staeriedimmiedïrregh smj-x-standard: Julevsáme duollatjállemdárkastus

I don't know if the voikkospell executable provided by the package behaved like the locally built one did.

snomos commented 7 years ago

XML support is crucial when integrating with libvoikko. It is the means by which we communicate to libvoikko the languages available for spell checking. As demonstrated in this bug report, hfst-ospell is just broken without it.

I don't know about #21, but #22 should be easy to fix.

TinoDidriksen commented 6 years ago

I think this was fixed ages ago in commit https://github.com/hfst/hfst-ospell/commit/dad7c5c80f318dfc6cb0833b29cc9fa66fc19f97 - is this still a problem?

TinoDidriksen commented 5 years ago

Nobody said anything for a year - reopen if this wasn't actually fixed.