Open PanderMusubi opened 6 years ago
I am OK with replacing them with +.
Moreover, I think we should consider removing non-word parts from lines altogether, at least optionally, during the build step.
However, if Nuspell is meant to be a drop-in replacement for Hunspell, shouldn’t you get your parser to support this case? I mean, I believe the Hunspell parser supports our current syntax because of the space after the word, any slash after that space cannot be meant for flags.
If your parser has this extra requirement for a good reason, such as Nuspell allowing multi-word entries in .dic files, it would be great if you could write up a short document explaining the syntax differences between Hunspell and Nuspell .dic (and .aff?) files, so that we can adapt our buildsystem in the future so that it can build files optimized for each spellchecking engine.
Just discussed this in our team and read some more details, using a +
is only a temporary workaround. Way better solution is to use multiple morphological fields with the same key, e.g.:
abafado/10,15 po:participio po:adxectivo
abafar/200,201,220,221,230,231 po:verbo ts:transitiva ts:intransitiva ts:pronominal VOLG:...
Concerning the drop-in replacement, we have implement a quicker but also stricter parser and only came across this issue now. Only gl_ES uses space slash space. Hunspell documentation actually describes the proposed usage of morphological fields, see e.g. fl:X fl:Y
in https://linux.die.net/man/4/hunspell but Hunspell did not complain about it. (Also early on in the documentation, it states that after a slash one or more flags are expected.) So, if you follow this example, you will also have better support in Hunspell too for the fields you use. Does this answer your questions?
Multiple morphological fields could have an impact on performance?
I can't say for Hunspell, you will have to test that. For Nuspell, this is how it only will be support (as we see it now).
É a solución a isto o que estropeei?
Sorry, I don't speak this language. Had to put it in a translation website. Do you have a link to the result so I can review that?
@meixome Yes, it is the solution to this that "you broke", but it was unavoidable, most pending pull requests had to break #264. I will eventually solve the issues and update the pull request accordingly.
Hi all, just friendly reminder:any progress regarding these two issues and an upcoming release with a fix in it? Thanks.
I'm one of the developers for Nuspell and we have come across an issue with the
gl_ES.dic
dictionary file. From all the 90 dictionaries in Nuspell/Hunspell/MySpell format we use in regression testing, this dictionary gives an error. It fails when Nuspell parses the slashes that are used in the morphological fields. This is the only dictionary using slashes there and that is, for parsing purposes, not wanted there.The lines triggering this are lines with
/
but without any flags, such as:Here the parsers considers all before the slash the word (
aínda po:adverbio
) and all after the slash the flags (conxunción
). Unfortenately, the slash has a special meaning in the .dic files needs to be followed by one or more flags. Hence, the following is not a workaround:Other lines with a slash in the morphological fields are:
There are several ways to solve this, for example:
or
or
We would like to help in your choice as we need to start on how exactly morphological fields with multiple pos tags will be processed.