ivanhk / fastText_java

Java port of c++ version of facebook fasttext
Other
121 stars 80 forks source link

the result of fastText_java is different from Facebook's #21

Open zhanlijun opened 7 years ago

zhanlijun commented 7 years ago

I found the result of fastText_java is different from the facebookresearch's fasttext, I use chinese data set.

ivanhk commented 7 years ago

please provide the dataset and parameters to reproduce the result.

maogeng commented 7 years ago

I find the same problem, the result is not same as the origin, a little lower precision.

ErikTromp commented 7 years ago

Just as a small remark - we have forked this repo and made fixes to make the Java port's output equal to that of the C++ one. We however also added support for sentence inference and hence had to support the new version of fastText. As such, we might have broken some things here and there (probably not though) and definitely do not support all of the current fastText features (in particular quantization).

Check it out at https://github.com/UnderstandLingBV/fastText_java

PS. if you want to keep using this repo instead, but still fix the difference between FB version and this one, apply our strictfp patch everywhere.

ali3assi commented 6 years ago

Hello Sir @ErikTromp,

I tried to execute the code in the project that you give above. 1- git clone https://github.com/UnderstandLingBV/fastText_java.git (please correct the command in the project page because you mention the old one) 2- module load Java/1.8.0_45 3- module load Maven/3.3.9 4- I change the principal function in the given Main class as the following: Main op = new Main(); FastText fasttext = new FastText(); fasttext.loadModel("/gs/project/tws-462-aa/model_a.bin"); System.out.println("end of the class Main"); 5- run this class using the command line:: mvn compile exec:java -Dexec.mainClass=fasttext.Main

I got the following error hope if you can help to resolve it: [WARNING] java.lang.IllegalArgumentException: Unknown loss_name enum value :1999999 at fasttext.Args$loss_name.fromValue(Args.java:51) at fasttext.Args.load(Args.java:129) at fasttext.FastText.loadModel(FastText.java:165) at fasttext.Main.main(Main.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282) at java.lang.Thread.run(Thread.java:745) [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1.040 s [INFO] Finished at: 2017-11-11T23:21:15-05:00 [INFO] Final Memory: 26M/1930M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project fasttext: An exception occured while executing the Java class. Unknown loss_name enum value :1999999 -> [Help 1] [ERROR]

Just to note that the model_a.bin is generated using the python library called fastext.

Thank you and appreciate your help in advance

ErikTromp commented 6 years ago

You have to be sure that the versions used to (de)serialize models in are using the same FastText versions. For example, I added support for deserializing the quantization-flag, but I do not support quantization itself. FastText has advanced quite a bit but I never updated my code. It's safe to say that the current FastText and my java port based on ivan's code are no longer compatible. You can however, use an older version of FastText with my java port if you don't need all the new features.

Alternatively, use the jFastText library https://github.com/vinhkhuc/JFastText that just acts as a Java wrapper around the C++ version. (Or patch this Java repo's code to make it compatible again, but I have no interest in doing that).