@sassbalint And there you have your pull request. :smile: These changes address all of the differences I have listed in the table, with the following caveats:
kind is provided on a best-effort basis (a token is punctuation if it consists only of punctuation characters)
the last whitespace in a batch is not added as a token. QunToken adds it, but in ML the text is trimmed, and I didn't want to change that
I also did a few cosmetic changes in the QunToken wrapper.
I have not included the jar (Lang_Hungarian/hungarian.jar), in case I have to make changes to this pull request. If you think it can be merged, I will compile and add the jar then.
@sassbalint And there you have your pull request. :smile: These changes address all of the differences I have listed in the table, with the following caveats:
kind
is provided on a best-effort basis (a token is punctuation if it consists only of punctuation characters)I also did a few cosmetic changes in the QunToken wrapper.
I have not included the jar (
Lang_Hungarian/hungarian.jar
), in case I have to make changes to this pull request. If you think it can be merged, I will compile and add the jar then.