Open heatherleaf opened 3 years ago
I used the old binary of v1.0 until now since it stopped working on macOS so I've rebuilt it and have noticed that in the "looked up the date" the "date" became a verb. Then I took the cache I preserved since using the previous version to compare:
diff --git a/2.txt b/1.txt
index 573cf80..5c3aa7e 100644
--- a/2.txt
+++ b/1.txt
@@ -1,19 +1,20 @@
-( NNS
-remember VBP
+( VBZ
+remember VB
= SYM
-I PRP
+I NNP
looked VBD
-up RP
-the DT
-date NN
-in IN
+up IN
+the VBP
+date VB
+in RP
the DT
logs NNS
-and CC
-checked VBD
+and NNP
+checked VBN
which WDT
-comic NN
+comic JJ
I PRP
-referred VBD
-to TO
+referred VBN
+to JJ
) VB
+
"the date" -- "VBP VB" -- is that correct? Maybe I'm supposed to take some updated model file from somewhere?
Hi all!
Do you know the exact source code corresponding to the old binary? Can you share the model files that you are using?
I have no idea about the source code of the old binary, I just downloaded it from https://code.google.com/archive/p/hunpos/downloads md5 : 4baee5cc5d9d3b0c3c691e375616d2a9
md5 en_wsj.model : f666dc61f7cbf3cc69366010a4e1f29f
Maybe the upload date has some relation to code version.
The new one was compiled without any issue following the instructions, after brew install ocaml
.
I am able to reproduce the issue with that model. Yesterday I had a quick look at it and remembered about issue #21. Maybe we are having a similar issue. I will try to setup an old OCaml environment (e.g., 3.10.x, which was the recommended version to compile Hunpos back when the binaries on Google code were compiled) as soon as I have some time and check if reverting to that environment improves the situation.
Indeed by compiling current source code with an older OCaml version (It works until 3.12.1 and breaks starting from 4.00.0) "solved" the issue.
I guess that #21 was only partially addressed and further investigation is needed. I do not know when I will have time to investigate the issue.
In the meahwhile you can either retrain the model or compile with OCaml 3.12.1.
Obviously, anyone willing to investigate and solve the issue is welcome. :-)
Oh, cool. I just wonder how do I install specific version of OCaml on macOS. I just used brew install ocaml
and homebrew does not really provide a way to install old versions of formulas. Is there any OCaml installation manager?
Yes, it is called opam.
From https://opam.ocaml.org/doc/Install.html I can see:
brew install gpatch
brew install opam
Once you have opam installed you can install specific versions following instructions at https://ocaml.org/docs/install.html.
On a clean setup it should be something like:
# environment setup (Required only the first time you use opam):
opam init
eval $(opam env)
# install given version of the compiler
opam switch create 3.12.1
# enable last opam setup (e.g., the setup you configured using switch opam command; you will have to run this command every time that you want to configure a shell to use this opam environment)
eval $(opam env)
# check you got what you want
which ocaml
ocaml -version
Hmmm, by default the v4 is installed, then the latest v3 is 3.12.1 and:
# cc -I../byterun -DCAML_NAME_SPACE -DNATIVE_CODE -DTARGET_amd64 -DSYS_macosx -O -D_FILE_OFFSET_BITS=64 -D_REENTRANT -c -o startup.o startup.c
# startup.c:161:3: error: implicit declaration of function 'caml_debugger_init' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
# caml_debugger_init (); /* force debugger.o stub to be linked */
# ^
# 1 error generated.
# make[3]: *** [startup.o] Error 1
# make[2]: *** [makeruntimeopt] Error 2
# make[1]: *** [opt-core] Error 2
# make: *** [world.opt] Error 2
<><> Error report <><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 🐫
┌─ The following actions failed
│ λ build ocaml-base-compiler 3.12.1
You are right, I tested 3.12.1 and not 3.12.2. I changed my comments above to reflect this.
As for the error you are experiencing, maybe you can open an issue either to opam or ocaml. I guess it should be possible to compile by setting some C compiler flags so that it does not fail due to this issue. Probably removing -Werror
CC flag is enough. Unfortunately I do not know how to do that with opam. 😅
This version of hunpos behaves differently than the original compiled binary:
Original version (downloaded from https://code.google.com/archive/p/hunpos/downloads):
The original version gives the correct output: "och" is the most common Swedish conjunction (KN) and not a foreign word (UO). The language model is available from here: https://github.com/spraakbanken/sparv-models/raw/master/hunpos/suc3_suc-tags_default-setting_utf8.model (beware, the model is 14MB)
I compiled both on Mac OS Catalina, and on Devuan Linux, and it behaves the same on both platforms (i.e., gives the wrong postag for "och").
Note: there are problems with at least the folloing common Swedish conjunctions:
UO
instead ofKN
HA
instead ofKN
PL
instead ofSN
(subjunction)