wolfe-pack / wolfe

Wolfe Language and Engine
https://wolfe-pack.github.io/wolfe
Apache License 2.0
135 stars 17 forks source link

Give EntityMention access to its character offsets #164

Open riedelcastro opened 9 years ago

riedelcastro commented 9 years ago

This makes them unique within a document, and useful even with other tokenizations. It would also make implementing NavigableEntityMentions (see NavigableDocument) trivial. The Token offsets can be derived from the character offsets, given a tokenization.

narad commented 9 years ago

Is there an advantage to having the EntityMention know its character offset and finding token offsets when need be, vs. having token offsets and finding character offsets when need be? I feel like a majority of the structures one posits as interacting with entities -- constituent spans, dependency arcs, coreference links, and relations, are more naturally thought of and operated on at the token level?

On Sat, Aug 15, 2015 at 12:41 PM, Sebastian Riedel <notifications@github.com

wrote:

This makes them unique within a document, and useful even with other tokenizations. It would also make implementing NavigableEntityMentions (see NavigableDocument) trivial. The Token offsets can be derived from the character offsets, given a tokenization.

— Reply to this email directly or view it on GitHub https://github.com/wolfe-pack/wolfe/issues/164.

riedelcastro commented 9 years ago

The navigable entity mention would make it easy to still have token offsets, that is, the entity mentions would know their token offsets as well. I would like to make the entity mentions navigable because I like to write things like 'mention.text' and 'mention.sentence' etc. via 'import doc.navigable'. This would be super easy if the mention knows its character offset instead of the token offset because the navigable document has a very efficient data structure to map from character offsets to everything else. I also like the idea of having any object in the document graph being uniquely defined through their grounding in the raw text.

Alternatively we can give entity mentions the index of the sentence that contains it. This would also enable navigation. It's not quite as clean to me, but fine.