qurator-spk / neat

Named entity annotation tool
Apache License 2.0
27 stars 5 forks source link
annotation-guidelines annotation-tool named-entities qurator sonar-idh

neat: named entity annotation tool


Screenshot

Table of contents

1. Introduction

2. User Guide

   2.1 Installation

   2.2 Data format

   2.3 Navigation

   2.4 Saving progress

3. Annotation Guidelines

1. Introduction

neat is a simple, browser-based tool for editing and annotating text with named entities to produce labeled data for training/testing/evaluation. It can be used to add or correct named entity labels and to correct the token text or tokenization (e.g. due to OCR/segmentation errors).

neat is developed at the Berlin State Library for data annotation in the SoNAR-IDH project and the QURATOR project.

2. User Guide

2.1 Installation

neat runs locally as a pure HTML+JavaScript webpage in your web browser. No additional software needs to be installed, but JavaScript has to be enabled in the browser.

Clone the repo using git clone https://github.com/qurator-spk/neat.git or download and extract the ZIP. Make sure you have neat.html and neat.js in the same directory and open neat.html in a browser. Any fairly recent browser should work, but only Chrome and Firefox are tested.

2.2 Data format

The source data we use for annotation are OCR results in PAGE-XML format. We provide a Python tool for the transformation of OCR files in PAGE-XML into the TSV format used by neat.

The internal data format used by neat is based on the format used in the GermEval2014 Named Entity Recognition Shared Task. Text is encoded as one token per line, with name spans in the IOB2 format as tab-separated values:

Example (simple)
No. TOKEN   NE-TAG  NE-EMB
# https://example.url
1   Donnerstag  O   O
2   ,   O   O
3   1   O   O   
4   .   O   O   
5   Januar  O   O   
6   .   O   O       
0       O   O
1   Berliner    B-ORG   B-LOC   
2   Tageblatt   I-ORG   O   
3   .   O   O       
0       O   O
1   Nr  O   O   
2   .   O   O       
3   1   O   O   
4   .   O   O   
0       O   O
1   Seite   O   O
2   3   O   O

For our purposes we extend this format by adding these (optional) values:

Example (full)
No. TOKEN   NE-TAG  NE-EMB  ID  url_id  left,right,top,bottom
# https://example.url/iiif/left,right,top,bottom/full/0/default.jpg
1   Donnerstag  O   O   -   0   174,352,358,390
2   ,   O   O   -   0   174,352,358,390 
3   1   O   O   -   0   367,392,361,381
4   .   O   O   -   0   370,397,352,379
5   Januar  O   O   -   0   406,518,358,386
6   .   O   O   -   0   406,518,358,386 
0
1   Berliner    B-ORG   B-LOC   Q455014 0   816,984,358,388
2   Tageblatt   I-ORG   O   Q455014 0   1005,1208,360,387
3   .   O   O   -   0   1005,1208,360,387
0
1   Nr  O   O   -   0   1237,1288,360,382
2   .   O   O   -   0   1237,1288,360,382
3   1   O   O   -   0   1304,1326,361,381
4   .   O   O   -   0   1304,1326,361,381
0
1   Seite   O   O   -   0   1837,1926,361,392
2   3   O   O   -   0   1939,1967,364,385

2.3 Navigation

neat can be used both with a keyboard or a mouse, but for ergonomic reasons, we strongly recommend the use of below key combinations.

Keyboard
Key Combination Action
Left Move one cell left
Right Move one cell right
Up Move one row up
Down Move one row down
PageDown Move page down
PageUp Move page up
Crtl+Up Move entire table one row up
Crtl+Down Move entire table one row down
---------- --------------------------------------------
s t Start new sentence in current row
m e Merge current row with row above
s p Create copy of current row
d l Delete current row
---------- --------------------------------------------
backspace Set NE-TAG / NE-EMB to O
b p Set NE-TAG / NE-EMB to B-PER
b l Set NE-TAG / NE-EMB to B-LOC
b o Set NE-TAG / NE-EMB to B-ORG
b w Set NE-TAG / NE-EMB to B-WORK
b c Set NE-TAG / NE-EMB to B-CONF
b e Set NE-TAG / NE-EMB to B-EVT
b t Set NE-TAG / NE-EMB to B-TODO
i p Set NE-TAG / NE-EMB to I-PER
i l Set NE-TAG / NE-EMB to I-LOC
i o Set NE-TAG / NE-EMB to I-ORG
i w Set NE-TAG / NE-EMB to I-WORK
i c Set NE-TAG / NE-EMB to I-CONF
i e Set NE-TAG / NE-EMB to I-EVT
i t Set NE-TAG / NE-EMB to I-TODO
---------- --------------------------------------------
enter Edit TOKEN or ID
esc Close TOKEN or ID edit field without
application of changes
---------- --------------------------------------------
l a add one display row
l r remove on display row (minimum is 5)
---------- --------------------------------------------
Mouse

2.4 Saving progress

neat runs fully locally in the browser. Therefore it can not automatically save any changes you made to disk. You have to use the Save Changes button to do so manually from time to time.

3. Annotation Guidelines

Annotation Guidelines