o2r-project / o2r-finder

Node.js implementation of search features for the o2r API
Apache License 2.0
1 stars 3 forks source link
elasticsearch microservice mongodb

o2r-finder

Build Status

Implementation of search features and the endpoint /api/v1/search for the o2r API.

Architecture

The finder utilizes Elasticsearch to provide means for

The auto-suggest search is is not readily available with MongoDB (though it has full text search).

Since we don't want to worry about keeping things in sync, the finder simply re-indices the whole database at startup and then subscribes to changes in the MongoDB using node-elasticsearch-sync (for both steps).

The /api/v1/search endpoint allows two types of queries:

1) Simple queries via GET: as an Elasticsearch query string

2) Complex queries via POST: using the Elasticsearch Query DSL

For more details and examples see the Search API documentation.

Special characters

The finder supports searching for special characters for these fields:

To support additional fields with special characters, the mapping in config/mapping.js has to be updated in order to copy the fields into the group field _special

/api/v1/search?q=10.1006%2Fjeem.1994.1031

"query": {
    "bool": {
        "should" : [
            {"query_string": {"default_field": "_all", "query": [...]}},
            {"query_string": {"default_field": "_special", "query": [...]}},
        ]
    }
}

Other possible options to search both fields are:

Indexed information

Compendia

The MongoDB id is stored as the entry id to allow deletion in Elasticsearch when an element is removed from MongoDB.

The "public" ID for the compendium is stored in compendium_id.

Example:

(...])
"hits": {
    "total": 6,
    "max_score": 1,
    "hits": [
        {
            "_score": 1,
            "_source": {
                "user": "0000-0001-6230-4374",
                "metadata": {},
                "jobs": [],
                "created": "2017-08-21T14:31:27.376Z",
                "files": {},
                "compendium_id": "mQryh"
            }
            },
            {
            "_score": 1,
            "_source": {
                "user": "0000-0001-6230-4374",
                "metadata": {},
                "jobs": [],
                "created": "2017-08-21T14:31:47.623Z",
                "files": {},
                "compendium_id": "Ks1Bc"
            }
        },
    ]
    (...)
}
(...)

Note: If you update the metadata structure of compendium or jobs and you already have indexed these in elasticsearch, you have to drop the elasticsearch o2r-index via

curl -XDELETE 'http://172.17.0.3:9200/o2r'

Otherwise, new compendia will not be indexed anymore.

Requirements

Dockerfile

This project includes a Dockerfile which can be built and run as follows. This is not a complete configuration, useful for testing only.

docker build -t finder .

# start databases in containers (optional)
docker run --name mongodb -d mongo:3.4 mongod --replSet rso2r --smallfiles
docker exec $(docker ps -qf "name=mongodb" bash -c "sleep 5; mongo --verbose --host mongodb --eval 'printjson(rs.initiate()); printjson(rs.conf()); printjson(rs.status()); printjson(rs.slaveOk());'"
docker run --name es -d -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:5.6.3

docker run -it --link mongodb --link es -e ELASTIC_SEARCH_URL=es:9200 -e FINDER_MONGODB=mongodb://mongodb -e MONGO_OPLOG_URL=mongodb://mongodb/muncher -e MONGO_DATA_URL=mongodb://mongodb/muncher -e DEBUG=finder -p 8084:8084 finder

The image can then be configured via environment variables.

Available environment variables

Development

Start an Elasticsearch instance and exposing the default port on the host:

docker run -it --name elasticsearch -d -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -e "xpack.security.enabled=false" -p 9200:9200 docker.elastic.co/elasticsearch/elasticsearch:5.6.3

Important: Starting with Elasticsearch 5, virtual memory configuration of the system (and in our case the host) requires some configuration, particularly of the vm.max_map_count setting, see https://www.elastic.co/guide/en/elasticsearch/reference/5.0/vm-max-map-count.html

You can then explore the state of Elasticsearch, e.g.

Start finder (potentially adjust Elasticsearch container's IP, see docker inspect elasticsearch)

npm install
DEBUG=finder FINDER_ELASTICSEARCH=localhost:9200 npm start;

You can set DEBUG=* to see MongoDB oplog messages.

Now check out the transferred documents:

Delete the index with

curl -XDELETE 'http://172.17.0.3:9200/o2r/'

Local test proxy

If you run the web service proxy from the project o2r-platform, you can run queries directly at the o2r API:

http://localhost/api/v1/search?q=*

Local container testing

The following code assumes the Docker host is available under IP 172.17.0.1 within the container.

 docker run -it -e DEBUG=finder -e FINDER_MONGODB=mongodb://172.17.0.1 -e ELASTIC_SEARCH_URL=http://172.17.0.1:9200 -p 8084:8084 finder

Tests

Required are running instances of Elasticsearch, MongoDB and the o2r-finder as described above.

To run the included tests, execute

npm test

License

o2r-informer is licensed under Apache License, Version 2.0, see file LICENSE.

Copyright (C) 2017 - o2r project.