Closed asharkinasuit closed 3 years ago
Good question! I should probably make this more explicit indeed. Although this is not explicitly checked currently, the order of the word references should indeed be as it occurs in the text, otherwise some unexpected behaviour might occur.
It's indeed best not to create WordReferences yourself indeed, the API will do it for you when you pass Word
instances. I don't really fully understand what's happening in your edit 3 situation yet.
An overridden insert
method as you suggest sounds like a good idea to solve the problem indeed.
As a more nasty low-level workaround, you could also clear all data in the entity (entity.data = []
), and then readd all the words.
The situation with edit 3 is that I completed Entities that lacked words that occur in my gold annotations, using WordReferences, but then if you call text()
on the entity, it only prints the words that are actual Words, not the rest. The problem with using add
and letting it do the conversion to WordReference is that it doesn't allow you to specify the place where you want the word to be added, the way insert does.
My hack right now is to include WordReferences anyway but to manually add a txt
property that I later read out. I guess resetting the data
property would be slightly cleaner...
It seems the documentation is silent on this, but since the word ids are included, the order technically doesn't matter. Right now, for instance, an Entity prints its words in the order in which they occur in the XML tree as child of the
<entity>
tag, not in the order in which they occur as part of the Sentence.If the order does not matter, it is easier to modify a given entity: just add the missing words. If it matters, you have to be careful about where each word goes in the tree, and I'm not sure it is possible to control that using just the
add
function.Edit: I see there's also an
insert
function that should do nicely for my last point 😃 Edit 2: ... except that one is actually inherited from the base class AbstractElement, which appears to be stricter about what kind of children it allows in. That is, the "automagical" acceptance that is provided in the AbstractSpanAnnotation.add method is not granted in the AbstractElement.insert. It would be nice ifinsert
were also overridden in AbstractSpanAnnotation to support this. Edit 3: ApparentlyWordReference
objects are allowed to beinsert
ed. Doing that seems to work, except that the entity'stext
method doesn't seem to take into account bare WordReference objects in its data list. Maybe this is because you're not supposed to construct WordReference objects yourself, maybe this is a separate issue... A related question would be: since WordReference only seems to offer the id of the word, how does one generally get the word for an id? I would have thought XPath should do, but the tree needed for that is unloaded by default.