nlplab / brat

brat rapid annotation tool (brat) - for all your textual annotation needs
http://brat.nlplab.org
Other
1.82k stars 509 forks source link

normalisation annotation fails with tutorial example #1375

Open Shimorina opened 3 years ago

Shimorina commented 3 years ago

Hello!

Thank you for the tool!

I noticed that brat server crashes when I tried to load the annotation for normalisation (e.g., N1 in the example file here https://github.com/nlplab/brat/blob/master/example-data/tutorials/news/000-introduction.ann ). I don't modify the file; I try to load it as an example.

What causes the error is that brat modifies the file when it reads it, and an additional tab is added between the second and the third column:

Initial annotation file: N1\tReference T5 Wikipedia:64488\tCarlos Salinas de Gortari

After modification: N1\tReference T5 Wikipedia:64488\t\tCarlos Salinas de Gortari

This problem occurs only with normalisation annotation. All others are fine. When I remove the write permissions from server users, the problem disappears, and the normalisation annotations are correctly displayed.

The error log:

.....
  File "/var/www/html/brat/./server/src/annotation.py", line 1255, in save
    time()
  File "/var/www/html/brat/./server/src/annotation.py", line 1194, in __exit__
    self.save()
  File "/var/www/html/brat/./server/src/annotation.py", line 1255, in save
    time()
  File "/var/www/html/brat/./server/src/annotation.py", line 1194, in __exit__
    self.save()
  File "/var/www/html/brat/./server/src/annotation.py", line 1249, in save
    with Annotations(tmp_file.name) as ann:
  File "/var/www/html/brat/./server/src/annotation.py", line 557, in __init__
    self._parse_ann_file(input_files)
  File "/var/www/html/brat/./server/src/annotation.py", line 1084, in _parse_ann_file
    self._parse_ann_lines(ann_lines, input_file_path)
  File "/var/www/html/brat/./server/src/annotation.py", line 1108, in _parse_ann_lines
    if not is_valid_id(id):
  File "/var/www/html/brat/./server/src/annotation.py", line 403, in is_valid_id
    __split_annotation_id(id)[1]
  File "/var/www/html/brat/./server/src/annotation.py", line 377, in __split_annotation_id
    m = re_match(r'^([A-Za-z]+|#[A-Za-z]*)([0-9]+)(.*?)$', id)
  File "/usr/lib/python3.8/re.py", line 191, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python3.8/re.py", line 291, in _compile
    if isinstance(flags, RegexFlag):
RecursionError: maximum recursion depth exceeded while calling a Python object

Tested with both standalone.py and installation on server with the master branch of the repository.

Shimorina commented 3 years ago

Apparently this bug was introduced in the recent commit https://github.com/nlplab/brat/commit/cf786c74a3bc317156a0ea2faba718eb434c325c.

If the following line is modified by removing the second tab, the code works. https://github.com/nlplab/brat/blob/23f8ffb996051f5139280c24cd851f29f2afa273/server/src/annotation.py#L1657

Should be: return u'%s\t%s %s %s:%s%s' % (