As a Researcher , I need to find data across my institution and my partners.
As a Researcher, I need to analyze those query results in a time efficient manner.
As a CCC engineer, in order to confirm our understanding of user spreadsheets and directories, I need a working, demonstrable POC that reads the data and publishes same to a data store.
As a CCC engineer, in order to confirm our understanding of the data model, I need data loaded into elastic search.
As a CCC engineer, in order to confirm our understanding of the user interactions, I need kibana connected to the datasource.
For more,see: https://ohsu.box.com/shared/static/luh1wsefp60uf92wd7w3c27e470behy2.pptx
Run the latest version of the ELK (Elasticseach, Logstash, Kibana) stack with Docker and Docker-compose.
It will give you the ability to analyze any data set by using the searching/aggregation capabilities of Elasticseach and the visualization power of Kibana.
Based on the official images:
# start elasticsearch, logstash, kibana, and cromwell
$ docker-compose up
Starting dmses_elasticsearch
Starting dmses_kibana
Starting dmses_logstash
Starting dmses_cromwell
...
# verified it started
$ docker-compose ps
Name Command State Ports
-------------------------------------------------------------------------------------------------
dmses_elasticsearch /docker-entrypoint.sh elas ... Up 0.0.0.0:9200->9200/tcp, 9300/tcp
dmses_kibana /docker-entrypoint.sh kibana Up 0.0.0.0:5601->5601/tcp
dmses_logstash /docker-entrypoint.sh logstash Up 0.0.0.0:5000->5000/tcp
dmses_cromwell /java $JAVA_OPTS ... cromwell Up 0.0.0.0:8000->8000/tcp
# show cluster state
$ curl $(docker-machine ip default):9200/_cluster/state/nodes?pretty
{
"cluster_name" : "ccc-es",
"nodes" : {
"oWFWGdDWQlucmyAX-mEtGw" : {
"name" : "central",
"transport_address" : "172.17.0.2:9300",
"attributes" : { }
}
}
}
# restore
cd elasticdump
docker run --rm -ti -v $(pwd)/data:/data elasticdump \
--bulk=true --input=/data/snapshot.js \
--output=http://$(docker-machine ip default):9200/
$curl $(docker-machine ip default):9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana 1 1 1 0 2.9kb 2.9kb
yellow open sample-icgc 5 1 850 0 984.7kb 984.7kb
yellow open specimen-icgc 5 1 658 0 1mb 1mb
yellow open individual-icgc 5 1 309 0 516.4kb 516.4kb
...
http://$(docker-machine ip default):5601/status
curl -XDELETE $(docker-machine ip default):9200/your-index-name-here
Install https://github.com/miku/estab
estab -header -host $(docker-machine ip default) -indices "resource-baml" -f "individualId sampleId ccc_did url"
Edit the docker-compose configuration for logstash to change which data is loaded when you run docker-compose up
.
volumes:
- ./services/logstash/config/icgc:/data
Or alternatively run:
$ cd logstash/icgc
$ docker run --add-host elasticsearch:$(docker-machine ip default) -v $(pwd):/data -it logstash logstash -f ./data/*.conf -v --verbose
Mapping Elastic search does not have field level aliases. Alternatives:
UI alternatives https://github.com/OlegKunitsyn/elasticsearch-browser http://stackoverflow.com/questions/29602467/can-i-change-top-menu-bar-and-remove-some-options-in-kibana-4
Kibana notes http://www.codeproject.com/Articles/986186/Reviewing-Kibana-s-client-side-code