Open rickbeeloo opened 4 years ago
Upon inspection of the request header for the webserver I noticed it adds a a \
before the last +
, so the pattern should be [{tag:/JJ/}]*[{tag:/NN.*/}]\+
I'm not sure this deserves to be closed. Didn't you have the expectation that the API does the escaping for you? Perhaps we should fix that in the stanza client.
Yes I indeed expected to be able to copy a regex from corenlp.run and obtain the same results (and thus also the same parsing)
I also noticed that entering the escaped regex, thus [{tag:/JJ/}]*[{tag:/NN.*/}]\+
on the werbserver will throw java.lang.RuntimeException: Error when parsing [{tag:/JJ/}]*[{tag:/NN.*/}]\+
this makes it even harder to test a regex since then an escaped one seems incorrect when tested on the webserver but correct in code and a non-escaped regex seems correct on the webserver but not in the code
I thought at first this was on the stanza side, but then I discovered it was an issue with the java code. The result is that although I fixed it, the fix didn't make it into the 4.1.0 version currently being built. It will be available in the next release or on github, though.
Aaah awesome!
Let's take the following setence:
organic wastes under variable temperature conditions
and pattern:[{tag:/JJ/}]*[{tag:/NN.*/}]+
When we pass this to http://corenlp.run/:Then when we do this in Python:
It will print:
{'sentences': [{'0': {'text': 'organic wastes', 'begin': 0, 'end': 2}, '1': {'text': 'variable temperature', 'begin': 3, 'end': 5}, '2': {'text': 'conditions', 'begin': 5, 'end': 6}, 'length': 3}]}
Note that the webserver finds "variable temperature conditions" whereas in Python we only find "variable temperature" and "conditions" as seperate matches. I need the same output as the webserver