This is a dockerised version of NoSketch Engine, the open source version of Sketch Engine corpus manager and text analysis software developed by Lexical Computing Limited.
This docker image is based on Debian stable and the NoSketch Engine build and installation process contains some additional hacks for convenient install and use. See Dockerfile for details.
git clone https://github.com/ELTE-DH/NoSketch-Engine-Docker
make pull
– to download the docker imagemake compile
– to compile sample corporamake execute
– to run a CLI query on susanne
corpusmake run
– to launch the docker container http://localhost:10070/
to try the WebUIsusanne
(original NoSkE sample corpus) and emagyardemo
Further info on how to analyse a plain text corpus by e-magyar and convert it to the right format suitable to fit in the system.
make pull
(or docker pull eltedh/nosketch-engine:latest
)make build IMAGE_NAME=myimage
– be sure to name your image using the IMAGE_NAME
parameter
conf/000-default.conf
and set user and password in conf/htpasswd
(e.g. use htpasswd -c conf/htpasswd USERNAME
command from apache2-utils
package)corpora/CORPUS_NAME/vertical
directory\
(see examples in corpora/susanne/vertical
and corpora/emagyardemo/vertical
directories)corpora/registry/CORPUS_NAME
file\
(see examples in corpora/registry/susanne
and corpora/registry/emagyardemo
)corpora/registry
directory using the docker image: make compile
make execute CMD="compilecorp --no-ske CORPUS_REGISTRY_FILE"
make run
http://SERVER_NAME:10070/
to usecorpinfo -s susanne
) at the end of the command:
make execute CMD="corpinfo -s susanne"
docker run --rm -it --mount type=bind,src=$$(pwd)/corpora,dst=/corpora ${IMAGE_NAME}:latest corpinfo -s susanne
make connect
make stop
: stops the containermake clean
: stops the container, removes indexed corpora and deletes docker image – use with caution!make
parameters, multiple images and multiple containersBy default,
nosketch-engine
,noske
,10070
.If there is a need to change these, make
commands can be supplemented
by IMAGE_NAME=myimage
and/or CONTAINTER_NAME=mycontainer
and/or PORT=myport
.
E.g. make build IMAGE_NAME=myimage
build an image called myimage
; and
make run IMAGE_NAME=myimage CONTAINER_NAME=mycontainer PORT=12345
launches the image called myimage
in a container called mycontainer
which will use port 12345
.
In the latter case the system will be availabe at http://SERVER_NAME:12345/
.
See the table below on which make
command accepts which parameter:
command | IMAGE_NAME |
CONTAINER_NAME |
PORT |
---|---|---|---|
make pull |
. | . | . |
make build |
✔ | . | . |
make compile |
✔ | . | . |
make execute |
✔ | . | . |
make run |
✔ | ✔ | ✔ |
make connect |
. | ✔ | . |
make stop |
. | ✔ | . |
make clean |
✔ | ✔ | . |
In the rare case of multiple different docker images, be sure to name them differently (by using IMAGE_NAME
).\
In the more common case of multiple different docker containers running simultaneously,
be sure to name them differently (by using CONTAINER_NAME
) and also be sure to use different port for each of them (by using PORT
).
If you want to build your own docker image be sure to include the IMAGE_NAME
parameter into the build command: make build IMAGE_NAME=myimage
and also provide IMAGE_NAME=myimage
for every make
command which accepts this parameter.
The following files in this repository are from https://nlp.fi.muni.cz/trac/noske and have their own license:
noske_files/manatee-open-*.tar.gz
(GPLv2+)noske_files/bonito-open-*.tar.gz
(GPLv2+)noske_files/crystal-open-*.tar.gz
(GPLv3)noske_files/gdex-*.tar.gz
(GPLv3)data/corpora/susanne/vertical
and data/registry/susanne
The rest of the files are licensed under the Lesser GNU GPL version 3 or any later.