knowitall / taggers

Easily identify and label sentence intervals using various taggers.
11 stars 12 forks source link

Eat up comments in parser combinator. #16

Closed jgilme1 closed 11 years ago

jgilme1 commented 11 years ago

Not sure if this is the right approach, please give me your thoughts @schmmd

schmmd commented 11 years ago

@jgilme1 it's the totally right approach. Who chose the awful name valn? Do we want a reluctant regex between // and \n? I'm not sure if it could eat up multiple lines. From your tests I suppose not, but it might be a good idea to make it safer. Also, it looks like you allow anything before //. Should we change it to ((\\s*//.*\\n)|(\\s*\\n))*?

Parser combinators are pretty cool.

As an aside, it try to avoid capture groups in regular expressions. So I would have probably written (?:\\s*//.*\\n)|(?:\\s*\\n)*. I'm not sure if it matters though--but just maybe it prevents a string creation.

jgilme1 commented 11 years ago

Ok fixed, good catch.