valeriobasile / learningbyreading

Learning by Reading pipeline of NLP and Entity Linking tools
GNU General Public License v2.0
82 stars 25 forks source link

Coreference resolution #5

Open valeriobasile opened 7 years ago

valeriobasile commented 7 years ago

By default, the option "resolve" of Boxer is turned off. This means that KNEWS misses a lot of references, e.g. "John is here. He is speaking" -> John is the Agent of Speaking. The option "resolve" is on and working when using the online version of Boxer but it doesn't work with the local version. And it should.

roquelopez commented 7 years ago

Hi, I was analysing the outputs of online and local version when option "resolve" is on, and both outputs are the same.

The reference resolution is OK. The "problem" is in the format of the xml generated by Boxer. In your example ("John is here. He is speaking" ), Boxer knows "He" refers to "Jhon" (it keeps an attribute named variable which have the same values for both tokens), but it still keeps others attributes for the token "He" such as symbol (symbol=male), type (type=n), etc.

I implemented a function to modify the xml in order to replace the attributes by the ones of the target token. In the example, all attributes of "Jhon" will be the same for "He".

Below two examples of outputs of Boxer before and after the call to the new function.

I) Robert is driving the car. He is running in the street. BEFORE predicates = [{'token_end': 6, 'token_start': 6, 'symbol': 'male', 'sense': '2', 'variable': 'x1', 'type': 'n'}, ...] namedentities = [...]

AFTER predicates = [...] namedentities = [{'token_end': 6, 'token_start': 6, 'symbol': 'robert', 'variable': 'x1', 'type': 'nam', class': 'per'}, ...]

II) The car is old. It was used by Peter. BEFORE predicates = [{'token_end': 5, 'token_start': 5, 'symbol': 'thing', 'sense': '12', 'variable': 'x1', 'type': 'n'}, ...]

AFTER predicates = [{'token_end': 5, 'token_start': 5, 'symbol': 'car', 'sense': '0', 'variable': 'x1', 'type': 'n'}, ...]

However, apparently, this change doesn't have consequences in the pipeline since the other functionalities of Knews doesn't used the modified attributes. They only use the attribute "variable" which is like an ID for each token. That new function could be useful for new features of Knews.

shankha117 commented 5 years ago

@roquelopez do you know how to use the API to generate a Boxer DRS output in XML format from a .txt file.