Closed matanox closed 6 years ago
better off asking the mailing list. Hardly anyone pays attention to this forum http://mailman.mit.edu/mailman/listinfo/moses-support please subscribe before you post
closing this. Looks like no-one's responding to this forum
Why close, if it is essentially open?
I'll reopen it, but don't be surprised if u get no response.
don't worry, I'll refer to it on the mailing list that you suggested :-)
I know nothing about moses' sentence splitter but give a try for eserix. I used it from time to time.
Thanks @tomekd, do you know whether it is accommodates different languages, v.s. being just useful for English? we're looking for something covering a wide range of languages, not that the Moses script was necessarily perfect at that.
Hi,
it supports the most popular languages:
Notice that it's really simple tool using SRX files.
Well, I guess, good to learn of SRX (Segmentation Rules eXchange) now :-) Other than reading the dry spec of it, may I assume that the implied algorithm comprises a two-step flow, where first a break is matched by all the break=yes
rules, and then the break may be avoided if it matches any of the break=no
rules? any notable libraries that execute the rules or notable rule depos? I see version 2.0 of the standard is supposed to be "safer" and Java is lagging in regex support required for it.
Essentially the perl script here has a similar flow, although it seems to struggle with introducing extra spaces that it later needs to discard, and arguably a bit of a hack when it comes to adaptation to special domains or language registers.
looks like the mailing list got you good responses. Closing now
Hi,
Fiddling the sentence splitting preprocessing util, we seem to get nothing really split. Probably a usage issue. Here's what we try in a UTF-8 terminal:
Any ideas off the top of your head?
Thanks!