Closed kermitt2 closed 9 years ago
Ok first guess, the mpl->type is not expressed in a portable way. We have in model.h
:
int type; // model type
which is serialized in model.c
with (line 271):
fprintf(file, "#mdl#%d#%"PRIu64"\n", mdl->type, nact);
%d
is suspicious as a portable format specifier... If we use uint64_t, the correct macro would be SCNu64 and PRIu64 for the type as well.
Other hypotheses to test maybe:
Your second hypothese is the right one. On my machine, the bug was systematic in locale fr.FR-UTF-8
Everything went back to normal after doing simply: export LC_ALL=C
Thanks for this suggestion !
Great thanks a lot Romain! Let's try to find a way now to force the LOCALE in Wapiti, so that the library becomes independent of the environement's LOCALE.
The locale has been set in our Wapiti trunk with the C locale.h
lib before reading and saving a model. See http://en.wikipedia.org/wiki/C_localization_functions. It does not affect the Locale of the environment which is unchanged.
Having tested Grobid after setting the environment Locale to fr_FR.UTF-8, grobid worked fine, so it should be solved with commit bbdea1c613f59fb97ff6615c4e16b75adfb3109b
It looks like nobody complained anymore about this problem after the fix, so let's close it ;)
The Wapiti binary models are not recognized on a few Linux machines.
The error is coming from model.c in Wapiti, when the header of the model is parsed via fscanf:
The header of the model looks like this on the problematic machine:
If the model is retrained on the problematic machine, it is working. However, the header format looks the same:
Users having this issue can use CRF++ as JNI CRF engine instead of Wapiti (a little bit slower, takes more memory, use smaller models - because of GitHub limitation on binary file size - but the result are similar).
In the file
grobid-home/config/grobid.properties
, simply change:by