Open hortongn opened 1 year ago
https://github.com/NatLibFi/Annif-tutorial/tree/master/data-sets
This directory holds two example data sets. The tutorial exercises may be completed using either data set.
yso-nlf is a data set consisting of the trilingual General Finnish Ontology YSO, a training data set constructed from metadata records from the Finna.fi discovery service, and some 2000 English language Master's and doctoral theses from the University of Jyväskylä.
stw-zbw contains the STW thesaurus for economics, metadata used in the ZBW retrieval system EconBiz and full texts of working papers in economics uploaded to EconStor.
Download source files here: annif-docker.zip
docker run -v /tmp/annif-docker:/annif-projects -u $(id -u):$(id -g) -it quay.io/natlibfi/annif bash
$ annif load-vocab uc-vocab ./uc-vocab.tsv --language en
$ annif train uc-en training-data/
-- no wildcard, just point to directoryIn Annif repo, you can start the web ui with docker compose and point to your project:
ANNIF_PROJECTS=/tmp/annif-docker MY_UID=$(id -u) MY_GID=$(id -g) docker-compose up
Create a "machine" that will suggest basic metadata from a document.
annif train tfidf-en /path/to/Annif-corpora/training/yso-finna-en.tsv.gz