mivoq / hunpos

Automatically exported from code.google.com/p/hunpos
11 stars 7 forks source link

Tagging errors (not in original Hunpos) #29

Open heatherleaf opened 3 years ago

heatherleaf commented 3 years ago

This version of hunpos behaves differently than the original compiled binary:

$ hunpos-tag suc3_suc-tags_default-setting_utf8.model < example.txt
jag PN.UTR.SIN.DEF.SUB  
och UO  
du  PN.UTR.SIN.DEF.SUB

Original version (downloaded from https://code.google.com/archive/p/hunpos/downloads):

$ hunpos-tag suc3_suc-tags_default-setting_utf8.model < example.txt
jag PN.UTR.SIN.DEF.SUB  
och KN  
du  PN.UTR.SIN.DEF.SUB  

The original version gives the correct output: "och" is the most common Swedish conjunction (KN) and not a foreign word (UO). The language model is available from here: https://github.com/spraakbanken/sparv-models/raw/master/hunpos/suc3_suc-tags_default-setting_utf8.model (beware, the model is 14MB)

I compiled both on Mac OS Catalina, and on Devuan Linux, and it behaves the same on both platforms (i.e., gives the wrong postag for "och").

Note: there are problems with at least the folloing common Swedish conjunctions:

Nakilon commented 2 years ago

I used the old binary of v1.0 until now since it stopped working on macOS so I've rebuilt it and have noticed that in the "looked up the date" the "date" became a verb. Then I took the cache I preserved since using the previous version to compare:

diff --git a/2.txt b/1.txt
index 573cf80..5c3aa7e 100644
--- a/2.txt
+++ b/1.txt
@@ -1,19 +1,20 @@
-(  NNS 
-remember   VBP 
+(  VBZ 
+remember   VB  
 =  SYM 
-I  PRP 
+I  NNP 
 looked VBD 
-up RP  
-the    DT  
-date   NN  
-in IN  
+up IN  
+the    VBP 
+date   VB  
+in RP  
 the    DT  
 logs   NNS 
-and    CC  
-checked    VBD 
+and    NNP 
+checked    VBN 
 which  WDT 
-comic  NN  
+comic  JJ  
 I  PRP 
-referred   VBD 
-to TO  
+referred   VBN 
+to JJ  
 )  VB  
+

"the date" -- "VBP VB" -- is that correct? Maybe I'm supposed to take some updated model file from somewhere?

giuliopaci commented 2 years ago

Hi all!

Do you know the exact source code corresponding to the old binary? Can you share the model files that you are using?

Nakilon commented 2 years ago

I have no idea about the source code of the old binary, I just downloaded it from https://code.google.com/archive/p/hunpos/downloads md5 : 4baee5cc5d9d3b0c3c691e375616d2a9

md5 en_wsj.model : f666dc61f7cbf3cc69366010a4e1f29f

Maybe the upload date has some relation to code version.

The new one was compiled without any issue following the instructions, after brew install ocaml.

giuliopaci commented 2 years ago

I am able to reproduce the issue with that model. Yesterday I had a quick look at it and remembered about issue #21. Maybe we are having a similar issue. I will try to setup an old OCaml environment (e.g., 3.10.x, which was the recommended version to compile Hunpos back when the binaries on Google code were compiled) as soon as I have some time and check if reverting to that environment improves the situation.

giuliopaci commented 2 years ago

Indeed by compiling current source code with an older OCaml version (It works until 3.12.1 and breaks starting from 4.00.0) "solved" the issue.

I guess that #21 was only partially addressed and further investigation is needed. I do not know when I will have time to investigate the issue.

In the meahwhile you can either retrain the model or compile with OCaml 3.12.1.

Obviously, anyone willing to investigate and solve the issue is welcome. :-)

Nakilon commented 2 years ago

Oh, cool. I just wonder how do I install specific version of OCaml on macOS. I just used brew install ocaml and homebrew does not really provide a way to install old versions of formulas. Is there any OCaml installation manager?

giuliopaci commented 2 years ago

Yes, it is called opam.

From https://opam.ocaml.org/doc/Install.html I can see:

brew install gpatch
brew install opam

Once you have opam installed you can install specific versions following instructions at https://ocaml.org/docs/install.html.

On a clean setup it should be something like:

# environment setup (Required only the first time you use opam):
opam init
eval $(opam env)

# install given version of the compiler
opam switch create 3.12.1

# enable last opam setup (e.g., the setup you configured using switch opam command; you will have to run this command every time that you want to configure a shell to use this opam environment)
eval $(opam env)

# check you got what you want
which ocaml
ocaml -version
Nakilon commented 2 years ago

Hmmm, by default the v4 is installed, then the latest v3 is 3.12.1 and:

# cc -I../byterun -DCAML_NAME_SPACE -DNATIVE_CODE -DTARGET_amd64 -DSYS_macosx  -O -D_FILE_OFFSET_BITS=64 -D_REENTRANT   -c -o startup.o startup.c
# startup.c:161:3: error: implicit declaration of function 'caml_debugger_init' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
#   caml_debugger_init (); /* force debugger.o stub to be linked */
#   ^
# 1 error generated.
# make[3]: *** [startup.o] Error 1
# make[2]: *** [makeruntimeopt] Error 2
# make[1]: *** [opt-core] Error 2
# make: *** [world.opt] Error 2

<><> Error report <><><><><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫 
┌─ The following actions failed
│ λ build ocaml-base-compiler 3.12.1
giuliopaci commented 2 years ago

You are right, I tested 3.12.1 and not 3.12.2. I changed my comments above to reflect this.

As for the error you are experiencing, maybe you can open an issue either to opam or ocaml. I guess it should be possible to compile by setting some C compiler flags so that it does not fail due to this issue. Probably removing -Werror CC flag is enough. Unfortunately I do not know how to do that with opam. 😅