moses-smt / mosesdecoder

Moses, the machine translation system
http://www.statmt.org/moses
GNU Lesser General Public License v2.1
1.58k stars 777 forks source link

fix syntax error; credit: https://patchwork.ozlabs.org/patch/735705/ #200

Closed jowagner closed 6 years ago

jowagner commented 6 years ago

Running experiment.perl on the ems-config as provided by http://www.statmt.org/wmt18/parallel-corpus-filtering.html (with adjustments in the general section), I got an error message about a left curly bracket. This PR changes the regular expression to avoid the error message.

I am not a moses or perl expert. Please review the changes in this PR carefully before accepting.

hieuhoang commented 6 years ago

thanks. I'm not a perl expert.

If there's anyone out there who has an opinion on this, please speak up

AmarEly2018 commented 6 years ago

?

[image: Mailtrack] https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& Sender notified by Mailtrack https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5& 24/06/18 à 19:14:19

Le dim. 24 juin 2018 à 19:06, Hieu Hoang notifications@github.com a écrit :

thanks. I'm not a perl expert.

If there's anyone out there who has an opinion on this, please speak up

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/moses-smt/mosesdecoder/pull/200#issuecomment-399775542, or mute the thread https://github.com/notifications/unsubscribe-auth/AizilCxbdlItHElkBA9qtS47xK6jEpjNks5t_9U8gaJpZM4U0CDh .

jowagner commented 6 years ago

fyi: I got the same error when trying master instead of 4.0. I encountered and fixed another regex error https://github.com/jowagner/mosesdecoder/commit/5bbd5ca1607f32e4c3e1841c704c89ebd53c9cb7 . Reading https://www.mail-archive.com/moses-support@mit.edu/msg15226.html , these perl issues and/or whether/how they are reported as errors may depend on the OS. I'm using openSUSE Leap 15.0 with perl v5.26.1.

hieuhoang commented 6 years ago

thanks. I'm inclined to pull them both into master. They look sensible, and they're just one liners so can easily be rolled back if there's a problem.

Will keep an eye out on the moses mailing list for feedback/issues

alvations commented 6 years ago

@jowagner the https://www.mail-archive.com/moses-support@mit.edu/msg15226.html issue was a different one from mteval-v13a.pl. That change was because the recent Perl version starts to throw deprecation warnings, more on https://github.com/moses-smt/mosesdecoder/pull/170


This change to EMS is a separate issue but I think it has to with the Perl version (and maybe different OS too). And the change is good but I think there's a better regex to be use there.

If I'm not wrong, reading the perl line:

if ($CONFIG{$in}[0] =~ /^\[([^:]+):[{](\S+)[}]:(\S+)\]$/) {

It's expecting the $CONFIG name to be in the format, e.g.

[xyz:{abc}:def]

Rather than using the charset matches (i.e. [{]), it might be better to be explicit and escape the curly brackets (i.e. \{) than to put it into a set of chars, see https://regex101.com/r/z64roV/1

^\[([^:]+):\{(\S+)\}:(\S+)\]$

It'll yield the same result as the original regex.

So the line becomes:

if ($CONFIG{$in}[0] =~ /^\[([^:]+):\{(\S+)\}:(\S+)\]$/) {
hieuhoang commented 6 years ago

pulled them both into master. Leaving release 4.0 alone,