Closed bethard closed 9 years ago
If build_binary didn't exist, it looks like you needed to run "ant kenlm". build_binary is KenLM's tool for compiling (optionally compressed) ARPA-style LMs into Ken's compiled format.
This was using the pipeline script, so presumably, if it needs to run "ant kenlm" it should have figured that out itself?
But all I'm really asking for here is a better error message. Instead of "If you are on OS X, you need to use ... BerkeleyLM, triggered with ... '--lm-gen berkeleylm'", it should say something like "If you are on OS X, you need to use ... BerkeleyLM, triggered with ... '--lm-gen berkeleylm --lm berkeleylm'."
Or are you saying that I shouldn't need to supply "--lm berkeleylm" and there's something wrong with the pipeline script?
--lm-gen determines what code is used to build the ARPA LM file; it defaults to KenLM's "lmplz" tool, which doesn't build on OS X. BerkeleyLM also has a tool, but I don't recommend its use because of the heuristics it uses for smoothing. Instead, I suggest SRILM, if lmplz is not available.
--lm determines the toolkit used to represent LM state in the decoder. It also defaults to KenLM, which then tries to use the "build_binary" tool to compile the ARPA LM. That should compile on all platforms, so if not, you have a problem (like you didn't type "ant kenlm"). BerkeleyLM has its own tool for compiling LMs. KenLM is recommended because it supports left-state minimization, which results in slightly more efficient search. Apart from that, they are equivalent.
If adding "--lm berkeleylm" is not the preferred solution on OS X and "ant kenlm" is the preferred solution, then somewhere, in one of the error messages I posted above, it should direct you to run "ant kenlm" on OS X. I'm really not particular as to the solution; I just want an error message that gives better guidance for solving the problem.
Agreed. We'd very happily accept a pull request with a fix.
So presumably if the fix is "ant kenlm", the place that the error message should be added is:
[compile-kenlm] rebuilding...
dep=lm.gz [CHANGED]
dep=lm.kenlm [NOT FOUND]
cmd=/Users/bethard/Downloads/joshua-v5.0/src/joshua/decoder/ff/lm/kenlm/build_binary lm.gz lm.kenlm
JOB FAILED (return code 127)
Is that right?
If you can point me to roughly where in the code this message is generated, I can probably provide a pull request that improves the error message.
Are you using the development or packaged version of Joshua? If devel you had to type "ant devel" (which calls "ant kenlm") to compile support libs. No need to put a warning about that in the pipeline, I don't think.
I was using the packaged one. Are you saying this is already a non-issue in trunk? If so, feel free to close. If not, feel free to point me to roughly where a fix might belong.
(Sorry, should have been clearer. I'm happy to clone the current repository and create a fix if it's still necessary, and someone can point me in the right direction.)
The packaged one occasionally fails to build the KenLM libraries, despite that being a dependency of the "all" target. That would be the thing to fix: figure out why typing "ant" or "ant all" sometimes fails to build KenLM. That should just be a matter of a small change to build.xml
You just need to figure out why KenLM didn't build, fix it, and issue a pull request from and against the "release" branch. Or were you asking something else?
Something else. Given that it's possible that KenLM isn't built sometimes, I think the error message in pipeline.pl should indicate that the problem might be a failed KenLM build. For example, if instead of:
[compile-kenlm] rebuilding...
dep=lm.gz [CHANGED]
dep=lm.kenlm [NOT FOUND]
cmd=/Users/bethard/Downloads/joshua-v5.0/src/joshua/decoder/ff/lm/kenlm/build_binary lm.gz lm.kenlm
JOB FAILED (return code 127)
It had said:
[compile-kenlm] rebuilding...
dep=lm.gz [CHANGED]
dep=lm.kenlm [NOT FOUND] (KenLM may not have been built. Try running "ant kenlm".)
cmd=/Users/bethard/Downloads/joshua-v5.0/src/joshua/decoder/ff/lm/kenlm/build_binary lm.gz lm.kenlm
JOB FAILED (return code 127)
Then the solution would have been more obvious.
In general, one of the things we've struggled with in trying to use Joshua is the error messages not giving enough detail to help us figure out what we've done wrong. So this seemed like a place where we could improve the error message.
That's not to say that it wouldn't be useful to dig into any problems with KenLM not building, but my goal here is just to improve error messages.
The build system has been changed a bit including fixes for compiling KenLM utils on OS X. I'm going to mark this as fixed in Joshua 6, unless you find that it still exists, in which case I'll take a closer look this time with an eye towards fixing it.
If you run
examples/pipeline/run.sh
on OS X, you'll get the following error and message:I believe the message about
--lm-gen
is incomplete. At least, when I added just--lm-gen berkeleylm
to therun.sh
script, I just got the following error:I had to add
--lm berkeleylm
in addition to--lm-gen berkeleylm
to get that script to run to completion.