sanskrit-lexicon / COLOGNE

Development of http://www.sanskrit-lexicon.uni-koeln.de/
18 stars 3 forks source link

kosh - what is it? #297

Open funderburkjim opened 4 years ago

funderburkjim commented 4 years ago

Background

I recently (ref) became more aware of work being done elsewhere at Cologne that is based in part on the sanskrit-lexicon dictionaries.

Thanks, @vocabulista , for these links 👍 In particular:

'kosh' is described as 'APIs for Dictionaries'. The work is done in Python and further

Somehow, a Docker image is made combining, in part, Flask and ElasticSearch.

This work seems very interesting and close to ideas we are beginning to think about. Perhaps we can build upon or else learn from these projects.

First question

With a local installation, we can really know what has been done.

@YevgenJohn and @drdhaval2785 Do these repositories provide enough information that we can do a local (or Digital Ocean) installation? If so, would you write a recipe for doing so?

YevgenJohn commented 4 years ago

We can build a Docker container and run in locally (or Digital Ocean), if we install Docker engine.

funderburkjim commented 4 years ago

From 'https://docs.docker.com/docker-for-windows/install/', it appears that this is NOT the best way to go, since it requires features not present in the Home version of Windows 10.

So, the initial steps would be: Install Virtualbox, then install ubuntu machine for virtualbox, within the virtual box ubuntu machine, install docker.

Agree? If so, does this look like a good installation recipe? https://wiki.cyverse.org/wiki/display/HDFDE/Installing+VirtualBox,+Ubuntu,+and+Docker

YevgenJohn commented 4 years ago

Windows10 Docker is a toy (when present). Mac has issue with Docker performance (I guess FS can't handle lots of layers efficiently), so it works but 5-10 times slower than in LinuxVM on Mac. VirtualBox Linux VM with Docker engine is what I would do for this exercise.

funderburkjim commented 4 years ago

Does that reference (wiki.cyverse...) look like a good recipe, or is there another you would recommend?

YevgenJohn commented 4 years ago

I haven't tried that recipe because of Ubuntu, I do use CentOs mostly, but that should be just fine, as it's installation differs between distros in which package manager is in use by the distro, the rest of details are minimal and simple (like I like to expose API port, which is optional, setting up a separate EXT3 volume for /var/lib/docker instead of XFS or whatever ditro likes, as I stepped into FS bugs earlier). I usually use this one (https://www.howtoforge.com/tutorial/centos-kubernetes-docker-cluster/) the part relevant to Docker only, in this case.

funderburkjim commented 4 years ago

Thanks! I'll give it a try in a few days.

gasyoun commented 4 years ago

uses Flask, uses ElasticSearch

So they already use what you have only mentioned earlier.

funderburkjim commented 4 years ago

Have got docker installed in a ubuntu image run in virtualbox.

Installation notes here: readme_docker.txt

Now, how to install https://github.com/cceh/kosh ?

YevgenJohn commented 4 years ago

The kosh repo has a dockerfile which can be used to build the image. After fetching the code and going into its folder, the following can be run: docker build -t whichever_image_tag . (for example https://stackoverflow.com/questions/28996907/docker-build-requires-1-argument-see-docker-build-help) It will create an image which can be launched user 'docker run'. More advanced thing is docker compose, which kosh has files for, can also be used, which would take care of doing 'docker run' with proper parameters for networking, volumes, etc. (https://docs.docker.com/compose/gettingstarted/)

funderburkjim commented 4 years ago

@YevgenJohn I tried this before seeing your message.

problem with first try

In Ubuntu,

cd ~/Documents
git clone https://github.com/cceh/kosh.git

cd kosh

try building a docker image from Dockerfile - first attempt

Dockerfile is in the kosh directory.

sudo docker build .

This yielded an error:

Sending build context to Docker daemon  763.4kB
Step 1/4 : FROM alpine:latest
Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Tried several more times. That timeout condition is discussed here with many complicated hard to understand instructions.

Second build attempt

That 'alpine' reference is a well-known part of Docker ecosystem; it's a 'small' Linux distro, I think. Tried:

sudo docker pull alpine

and this worked, with messages

Using default tag: latest
latest: Pulling from library/alpine
c9b1b535fdd9: Pull complete 
Digest: sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d
Status: Downloaded newer image for alpine:latest
docker.io/library/alpine:latest

Now, the docker build works:

sudo docker build .

There is very lengthy output, some warnings. And then Successfully built ca861a0c42b8 readme_kosh_log.txt

Try running the docker container

sudo docker run ca861a0c42b8

This seems to start ok, but runs into an error:

2020-01-18 22:49:02 [INFO] <kosh.kosh> Started kosh with pid 1
2020-01-18 22:49:02 [INFO] <kosh.kosh> Loaded API endpoint modules ['graphql', 'restful']
2020-01-18 22:49:02 [INFO] <kosh.kosh> Starting data sync in /var/lib/kosh
2020-01-18 22:49:02 [INFO] <kosh.kosh> Deploying web server at 0.0.0.0:5000
Exception in thread update:
...
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/kosh'

Not sure how to fix.

YevgenJohn commented 4 years ago

docker compose file has suggestion to map directory into it: volumes: ['../kosh_data:/var/lib/kosh:ro'] which means docker run needs a volume parameter, something like: docker run ca861a0c42b8 -v some_local_foder:/var/lib/kosh

drdhaval2785 commented 4 years ago

I guess you may be missing the data files which are available at https://github.com/cceh/c-salt_sanskrit_data

vocabulista commented 4 years ago

Dear all, You can find deployment instructions for Kosh here: https://cceh.github.io/kosh/ As @drdhaval2785 ṕoints out, besides the XML files, two files per dataset are required: a '.kosh' file and a JSON coniguration file. You can find examples for both files either at https://github.com/cceh/c-salt_sanskrit_data or at https://github.com/cceh/kosh_data . If you could put all the XML files of the dictionaries on a single GitHub repository (like you did with the TXT files here https://github.com/sanskrit-lexicon/csl-orig/tree/master/v02), it will be easier for everyone involved to understand how Kosh is deployed. I will be the next week out of office and thus able to answer your comments/questions actively from the first days of February on. Best

funderburkjim commented 4 years ago

cloned two data sources

Put the directories of these as siblings of the 'kosh' directory.

Deployment documentation

Found at https://cceh.github.io/kosh/

docker cceh/kosh container

cd kosh sudo docker build -t cceh/kosh .

install docker-compose

This is required for next docker container. Followed recipe at stackoverflow article

sudo curl -L "https://github.com/docker/compose/releases/download/1.22.0/docker-compose-$(uname -s)-$(uname -m)"  -o /usr/local/bin/docker-compose
sudo mv /usr/local/bin/docker-compose /usr/bin/docker-compose
sudo chmod +x /usr/bin/docker-compose

docker-compose.local.yml

This specifies data source.

version: '2.3'
services:
  kosh:
    volumes: ['../kosh_data:/var/lib/kosh:ro']

The default setting should work with the installed ../kosh_data

A problem will occur related to /var/lib/kosh:ro later on.

The name of this file (docker-compose.local.yml) is used as parameter in next step.

docker compose step

This installs elasticsearch, if needed (about another 1G of space), and does some other stuff. sudo docker-compose -f docker-compose.yml -f docker-compose.local.yml up -d The following are ending messages, indicating success:

Creating network "kosh_network" with driver "bridge"
Creating kosh_elastic_1 ... done
Creating kosh_kosh_1    ... done

show the containers now present

sudo docker ps -a
CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS                            PORTS                    NAMES
3fb9e928f3ab        cceh/kosh:latest      "/usr/bin/kosh --dat…"   2 minutes ago       Up 2 minutes (health: starting)   0.0.0.0:5000->5000/tcp   kosh_kosh_1
a53e6b12b704        elasticsearch:7.0.0   "/usr/local/bin/dock…"   3 minutes ago       Up 3 minutes (healthy)            9200/tcp, 9300/tcp       kosh_elastic_1

But how to search?

At this point, Searching should be possible, or so I thought. I opened a Firefox browser in Ubuntu, with http://localhost:5000/ but get error:

The connection was reset
The connection to the server was reset while the page was loading.

Have tried various other things, with various errors.

gasyoun commented 4 years ago

At this point, Searching should be possible, or so I thought.

In February we will get an answer, I hope.

drdhaval2785 commented 3 years ago

@vocabulista Can you help us further with this issue?

vocabulista commented 3 years ago

Sure. I did not answer here, because I thought that it was not necessary. I did afterwards communicate mit @funderburkjim in order to create an up-to-date version of the CDSD via Kosh, which is running since a couple of months ago: https://cceh.github.io/kosh/docs/implementations/cdsd.html You can find the software here: https://github.com/vocabulista/csl-kosh If you want to deploy Kosh locally, you can find the instructions here: https://cceh.github.io/kosh/docs/deployment.html If you have any problems, please let me know. We have deployed Kosh on different Linux distros without problems.

We developed a web app for querying these and other Kosh-APIs: https://dicts.uni-koeln.de/ It is still being developed, so if you do find bugs, do not surprise, and please let us know (https://github.com/cceh/kosh_client/issues).

gasyoun commented 3 years ago

It is still being developed, so if you do find bugs, do not surprise

https://github.com/cceh/kosh_client/issues/1