google / sling

SLING - A natural language frame semantics parser
Apache License 2.0
1.93k stars 268 forks source link

Document annotation pipeline #406

Closed ringgaard closed 5 years ago

ringgaard commented 5 years ago

I have added a new Annotator component type for annotating documents. Document annotators can be put together in a pipeline for document processing. The DocumentProcessor task can now be configured with a pipeline of document annotators which are run before the being processed.

There is also a stand-alone pipeline processor (the DocumentAnnotation class) which can be used for processig documents outside the task system. This takes a pipeline spec and sets up a configured document processing pipeline.

I have extended the corpus-browser app so a document processing pipeline can be run before displaying the documents.

I have also made a document analyzer app that takes documents (in LEX format), runs the document through the document annotation pipeline, and uses the document viewer to display the document in the web interface. For example, the following script can be used for making a parser demo:

#!/bin/bash

SPEC='{
  annotator: "parser"
  annotator: "mention-name"
  inputs: {
    parser: {
      file: "local/data/corpora/caspar/caspar.flow"
      format: "flow"
    }
  }
  parameters: {
    language: "en"
  }
}'

bazel-bin/sling/nlp/document/analyzer --spec "${SPEC}"

I have added annotator components for the parser and the NER labeler.

I have deprecated the span indices for documents since these are no longer used.