This branch adds basic support for tokensregex/semgrex/tregex. Users can perform these regex queries via methods exposed by the CoreNLPClient object.
For tokensregex and semgrex, users can enable a to_words flag that will convert the output from the default sentence-separated format to a flat list of mentions.
(Note: to_words only causes the top-level matches to be flattened. For tokensregex queries that have nested matches, only the topmost-level matches are flattened and the nested matches are untouched.)
Examples
tokensregex Demo
annotators = 'tokenize ssplit ner depparse'.split()
client = corenlp.CoreNLPClient(annotators=annotators)
# Example pattern from: https://nlp.stanford.edu/software/tokensregex.shtml
text = 'Hello. Bob Ross was a famous painter. Goodbye.'
pattern = '([ner: PERSON]+) /was|is/ /an?/ []{0,3} /painter|artist/'
matches = client.tokensregex(text, pattern)
print(json.dumps(matches, indent=2))
Description
This branch adds basic support for tokensregex/semgrex/tregex. Users can perform these regex queries via methods exposed by the
CoreNLPClient
object.For tokensregex and semgrex, users can enable a
to_words
flag that will convert the output from the default sentence-separated format to a flat list of mentions.(Note:
to_words
only causes the top-level matches to be flattened. For tokensregex queries that have nested matches, only the topmost-level matches are flattened and the nested matches are untouched.)Examples
tokensregex
DemoOutput:
semgrex
DemoOutput:
more on the way