nlplab / brat

brat rapid annotation tool (brat) - for all your textual annotation needs
http://brat.nlplab.org
Other
1.82k stars 509 forks source link

Prohibit selection across multiple 'sentences' #966

Open astent opened 11 years ago

astent commented 11 years ago

When using a sentence splitter, e.g. the basic new line splitter, the annotation file will contain 'sentences' or the most basic units. But currently with brat, the annotator can select from the end of a 'sentence' to the end of another 'sentence', and this leads to errors that brat notices but apparently cannot prevent. The annotator can also select from the middle of a 'sentence' to the middle of another 'sentence', which is worse. Is it possible to prevent or prohibit annotators from selecting across these basic units?

ghost commented 11 years ago

@astent: I agree with the issue, but whether or not annotating across a sentence break is an annotation error or not will depend on the annotation guidelines. It is also the case that any sentence splitting, even manual such, could introduce errors that may then disallow a correct annotation. It is at least my opinion that this would have to be an annotation configuration option.

@spyysalo: I remember that we had this protection previously and that it was removed upon request from other users. Did we move it into the annotation configuration at that time? If not, could you claim this one for the server side and then hand back to @amadanmath to catch the corresponding exception?

spyysalo commented 11 years ago

@ninjin : that's correct, span annotations crossing sentence boundaries were previously prohibited (both by the client and the server, I think), but this restriction was removed in response to user requests (#786 and possibly others).

Relevant commits include at least 37cb4de18177128771e312ee2efaa59ea33bbd3d and fdd275acf8e836b2629d6b67ead0db6ef9feed37 .

I agree to have this under config control. As a first suggestion, how about adding an [options] section also to annotation.conf (the two other confs already have this) and having something like

Cross-sentence allow:all

vs.

Cross-sentence allow:none

to control the two basic alternatives? This could be later generalized to allow e.g. specifying that only certain types of span annotations can cross sentence boundaries.