Elasticsearch DB as a package

joshuaquek commented 2 years ago

Hi @eyberg ! Joshua here! I saw the page on https://repo.ops.city/ that we are able to have databases as images (I saw that MongoDB, MySQL, Meilisearch are possible on the page). Is it possible to do the same with Elasticsearch DB?

I am trying to figure this out by using a docker image https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html but havent had any success.

eyberg commented 2 years ago

yes, i know we've ran elastic before but i don't see a pkg for it so we could create one, your typical ELK stack would be {elastic,logstash,kibana} - both elastic && kibana would be their own packages while logstash could be a kllb and/or package depending on usage

joshuaquek commented 2 years ago

Hi Ian, awesome, yes this would be exactly it. For now I am trying to just do a simple Kibana unikernel and Elastic unikernel and to see if this works, but am running into issues. No worries, I will update my progress over here

eyberg commented 2 years ago

just an update on this i do have some pkgs that boot at but still need further work:

ops pkg load nanovms/elasticsearch:8.2.0 -p 9200 -p 9300
ops pkg load nanovms/kibana:8.2.2 -p 5601

however, the elastic one still needs quite a bit of configuration done to it as I think the rng and host lookup are off degrading perf - should definitely be fixable once we look into it more - it's probably just config; it's been a few years since i used kibana but I think you need to configure an es node to use it (if so can do that via config) - i put some notes into the readme and you can check the config like so:

➜  ~ cat ~/.ops/packages/nanovms/elasticsearch_8.2.0/package.manifest
{
  "Program":"/jdk/bin/java",
  "Version":"8.2.0",
  "Env": {
    "LD_PRELOAD": "l/l"
  },
  "BaseVolumeSz": "3g",
  "Env": {
    "JAVA_HOME": "/jdk",
    "LIBFFI_TMPDIR": "/tmp"
  },
  "RunConfig": {
    "Memory": "3G"
  },
  "Args": ["-Xshare:auto", "-Des.networkaddress.cache.ttl=60", "-Des.networkaddress.cache.negative.ttl=10",
  "-Djava.security.manager=allow", "-XX:+AlwaysPreTouch", "-Xss1m", "-Djava.awt.headless=true",
  "-Dfile.encoding=UTF-8", "-Djna.nosys=true", "-XX:-OmitStackTraceInFastThrow",
  "-XX:+ShowCodeDetailsInExceptionMessages", "-Dio.netty.noUnsafe=true",
  "-Dio.netty.noKeySetOptimization=true", "-Dio.netty.recycler.maxCapacityPerThread=0",
  "-Dlog4j.shutdownHookEnabled=false", "-Dlog4j2.disable.jmx=true", "-Dlog4j2.formatMsgNoLookups=true",
  "-Djava.locale.providers=SPI,COMPAT", "--add-opens=java.base/java.io=ALL-UNNAMED", "-XX:+UseG1GC",
  "-Djava.io.tmpdir=/tmp/elasticsearch-7184952636575940591", "-XX:+HeapDumpOnOutOfMemoryError",
  "-XX:+ExitOnOutOfMemoryError", "-XX:HeapDumpPath=data", "-XX:ErrorFile=/logs/hs_err_pid%p.log",
  "-Xlog:gc*,gc+age=trace,safepoint:file=/logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m",
  "-Xms1g", "-Xmx1g", "-XX:InitiatingHeapOccupancyPercent=30",
  "-XX:G1ReservePercent=25", "-Des.path.home=/",
  "-Dos.name=Linux",
  "-Des.path.conf=/config", "-Des.distribution.flavor=default",
  "-Des.distribution.type=tar", "-Des.bundled_jdk=true", "-cp", "/lib/*",
  "org.elasticsearch.bootstrap.Elasticsearch"
  ]
}

two main things I did w/es itself was to override the root check (we don't have users but some apps still want to check and we return a stub by default of '0', in this case I just made it 42), /proc/self/stat is also stubbed

joshuaquek commented 2 years ago

Hi @eyberg , thank you!! I have tried doing the ops pkg load nanovms/elasticsearch:8.2.0 -p 9200 -p 9300 command, but it seems that its throwing me an error.

I have tried doing it with the official elasticsearch:8.2.2 docker image as I realise that you guys have the from-docker flag/command - https://nanovms.com/dev/tutorials/converting-docker-containers-to-nanos-unikernels , but to no avail

However, thank you for getting back so quickly on this matter!! 🙏🏼

joshuaquek commented 2 years ago

Two weeks ago, I tried with a docker container, and after trying again, here was the same error that I got as that time:

...for Linux I believe that it is bin/elasticsearch or elasticsearch.bat for windows (https://www.elastic.co/downloads/elasticsearch#:~:text=2-,Start%20Elasticsearch,-Run%20bin/elasticsearch)

joshuaquek commented 2 years ago

I think the last time I tried this it was on an older version, Version 7.16 or 8.0, but for the old one as well as the latest one elasticsearch v8.2.2, both gave the same not a dynamic executable error

joshuaquek commented 2 years ago

On the bright side, I have tried running Kibana using your suggested command ops pkg load nanovms/kibana:8.2.2 -p 5601 and it seems to start well! 👍🏼👍🏼 (screenshot below)

I would then need to have a kibana.yml file which Kibana reads from. Usually this file can be edited at /etc/kibana/kibana.yml after installing Kibana from a tar.gz file. (Reference for kibana.yml that I am using as a guide - https://github.com/elastic/kibana/blob/main/config/kibana.yml )

For my config.json I have structured it as such:

{
  "Dirs": [],
  "CloudConfig": {
    "Platform": "aws",
    "ProjectID": "kibana-unikernel-demo",
    "Zone": "ap-southeast-1",
    "BucketName": "kibana-unikernel-demo-s3-bucket"
  },
  "Files": ["kibana.yml"],
  "RunConfig": {
    "Ports": ["5601"],
    "Verbose": true
  }
}

I am still in the midst of experimenting to get it working, and I will update my progress over here. Thanks!

joshuaquek commented 2 years ago

Update:

Kibana running as a unikernel works! 🎉

I am testing this with an online Cloud-hosted ElasticsearchDB instance for now, as my local unikernel ElasticsearchDB instance is not working as of yet.

Latest config.json :

{
  "Dirs": [],
  "CloudConfig": {
    "Platform": "aws",
    "ProjectID": "kibana-unikernel-demo",
    "Zone": "ap-southeast-1",
    "BucketName": "kibana-unikernel-demo-s3-bucket"
  },
  "Files": ["kibana.yml"],
  "Env": {
    "KBN_PATH_CONF": "/"
  },
  "RunConfig": {
    "Ports": ["5601"],
    "Verbose": true
  }
}

Directory Structure for Kibana specifiically:

How my kibana.yml looks like (note that I had to generate the Service Account token using https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-service-token.html which then goes into the kibana.yml config) :

# For more configuration options see the configuration guide for Kibana in
# https://www.elastic.co/guide/index.html

# =================== System: Kibana Server ===================
# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "0.0.0.0"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# Defaults to `false`.
#server.rewriteBasePath: false

# Specifies the public URL at which Kibana is available for end users. If
# `server.basePath` is configured this URL should end with the same basePath.
server.publicBaseUrl: "http://0.0.0.0:5601"

# The maximum payload size in bytes for incoming server requests.
#server.maxPayload: 1048576

# The Kibana server's name. This is used for display purposes.
server.name: "kibana-unikernel"

# =================== System: Kibana Server (Optional) ===================
# Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively.
# These settings enable SSL for outgoing requests from the Kibana server to the browser.
#server.ssl.enabled: false
#server.ssl.certificate: /path/to/your/server.crt
#server.ssl.key: /path/to/your/server.key

# =================== System: Elasticsearch ===================
# The URLs of the Elasticsearch instances to use for all your queries.
elasticsearch.hosts: ["https://sample.my-cluster-address.com:9243"]

# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
# elasticsearch.username: "default"
# elasticsearch.password: "default"

# Kibana can also authenticate to Elasticsearch via "service account tokens".
# Service account tokens are Bearer style tokens that replace the traditional username/password based configuration.
# Use this token instead of a username/password.
elasticsearch.serviceAccountToken: "Please generate via https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-service-token.html. Then paste the generated token here."

# Time in milliseconds to wait for Elasticsearch to respond to pings. Defaults to the value of
# the elasticsearch.requestTimeout setting.
#elasticsearch.pingTimeout: 1500

# Time in milliseconds to wait for responses from the back end or Elasticsearch. This value
# must be a positive integer.
#elasticsearch.requestTimeout: 30000

# The maximum number of sockets that can be used for communications with elasticsearch.
# Defaults to `Infinity`.
#elasticsearch.maxSockets: 1024

# Specifies whether Kibana should use compression for communications with elasticsearch
# Defaults to `false`.
#elasticsearch.compression: false

# List of Kibana client-side headers to send to Elasticsearch. To send *no* client-side
# headers, set this value to [] (an empty list).
#elasticsearch.requestHeadersWhitelist: [ authorization ]

# Header names and values that are sent to Elasticsearch. Any custom headers cannot be overwritten
# by client-side headers, regardless of the elasticsearch.requestHeadersWhitelist configuration.
#elasticsearch.customHeaders: {}

# Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable.
#elasticsearch.shardTimeout: 30000

# =================== System: Elasticsearch (Optional) ===================
# These files are used to verify the identity of Kibana to Elasticsearch and are required when
# xpack.security.http.ssl.client_authentication in Elasticsearch is set to required.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key

# Enables you to specify a path to the PEM file for the certificate
# authority for your Elasticsearch instance.
#elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ]

# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full

# =================== System: Logging ===================
# Set the value of this setting to off to suppress all logging output, or to debug to log everything. Defaults to 'info'
#logging.root.level: debug

# Enables you to specify a file where Kibana stores log output.
#logging.appenders.default:
#  type: file
#  fileName: /var/logs/kibana.log
#  layout:
#    type: json

# Logs queries sent to Elasticsearch.
#logging.loggers:
#  - name: elasticsearch.query
#    level: debug

# Logs http responses.
#logging.loggers:
#  - name: http.server.response
#    level: debug

# Logs system usage information.
#logging.loggers:
#  - name: metrics.ops
#    level: debug

# =================== System: Other ===================
# The path where Kibana stores persistent data not saved in Elasticsearch. Defaults to data
#path.data: data

# Specifies the path where Kibana creates the process ID file.
#pid.file: /run/kibana/kibana.pid

# Set the interval in milliseconds to sample system and process performance
# metrics. Minimum is 100ms. Defaults to 5000ms.
#ops.interval: 5000

# Specifies locale to be used for all localizable strings, dates and number formats.
# Supported languages are the following: English (default) "en", Chinese "zh-CN", Japanese "ja-JP", French "fr-FR".
#i18n.locale: "en"

# =================== Frequently used (Optional)===================

# =================== Saved Objects: Migrations ===================
# Saved object migrations run at startup. If you run into migration-related issues, you might need to adjust these settings.

# The number of documents migrated at a time.
# If Kibana can't start up or upgrade due to an Elasticsearch `circuit_breaking_exception`,
# use a smaller batchSize value to reduce the memory pressure. Defaults to 1000 objects per batch.
#migrations.batchSize: 1000

# The maximum payload size for indexing batches of upgraded saved objects.
# To avoid migrations failing due to a 413 Request Entity Too Large response from Elasticsearch.
# This value should be lower than or equal to your Elasticsearch cluster’s `http.max_content_length`
# configuration option. Default: 100mb
#migrations.maxBatchSizeBytes: 100mb

# The number of times to retry temporary migration failures. Increase the setting
# if migrations fail frequently with a message such as `Unable to complete the [...] step after
# 15 attempts, terminating`. Defaults to 15
migrations.retryAttempts: 2

# =================== Search Autocomplete ===================
# Time in milliseconds to wait for autocomplete suggestions from Elasticsearch.
# This value must be a whole number greater than zero. Defaults to 1000ms
#unifiedSearch.autocomplete.valueSuggestions.timeout: 1000

# Maximum number of documents loaded by each shard to generate autocomplete suggestions.
# This value must be a whole number greater than zero. Defaults to 100_000
#unifiedSearch.autocomplete.valueSuggestions.terminateAfter: 100000

joshuaquek commented 2 years ago

It is getting bit late on my end here. Next step that I'll be doing is to try to get the Elasticsearch database working as a unikernel. I will continue to post my progress over here. Cheers!

eyberg commented 2 years ago

we have a few changes we're making to the pkg we created and will roll that out in the next day or so, also, this pr improves startup speed https://github.com/nanovms/nanos/pull/1733

joshuaquek commented 2 years ago

Awesome! Thanks for the update @eyberg

eyberg commented 2 years ago

for your question why we the 'load from docker' doesn't work for this use-case is that bin/elasticsearch is a shell script that does a lot of env pre-population before launching - nothing wrong w/that but we don't run shell scripts as it implies that you are going to run many different programs - so what we do with this particular package is call the actual program (eg: jvm) provide it's class and associated files and load up the configuration;

also - something to be aware of - the first time it loads there is a bunch of index creation which slows things down - i don't know what the best workflow here would be whether you start with a base pkg and build on top of that or what but something to be aware of - you can always re-use the existing package and not re-build by passing '-s' or '--skipbuild'

joshuaquek commented 2 years ago

Hey Ian, thanks! sure, I'll take note about the --skipbuild flag for reusing a package without rebuilding it.

On a side note, I've tried running ops pkg load nanovms/elasticsearch:8.2.0 -p 9200 -p 9300 as per what you mentioned but so far haven't had any success on getting Elasticsearch running as. unikernel

Usually if Elasticsearch is running the message for a regular response that one gets would look something like this:

Just wondering, if I have downloaded the Linux x86_64 tar.gz compressed file from https://www.elastic.co/downloads/elasticsearch . Is there a way where I can run this as a unikernel directly on my local machine?

francescolavra commented 2 years ago

Hi @joshuaquek, the unable to install syscall filter, unable to retrieve max number of threads and unable to retrieve max file size warnings are expected and are due to some Linux kernel features not implemented in Nanos. I would advise to use the Nanos nightly build (which contains the fix in https://github.com/nanovms/nanos/pull/1733) instead of the latest Nanos release, otherwise you would run into an issue that causes a thread to consume 100% CPU. To do that, just add the -n command line flag in the ops pkg load command. Using the Nanos nightly build, I'm able to reach the web server at https://0.0.0.0:9200/ From your screenshot it looks like you didn't wait for startup to complete (when the node is started, you should see a [INFO ][o.e.n.Node ] [] started message on the console).

As for the Linux x86_64 tar.gz file, yes, the Elasticsearch package has been created starting from the contents of that file. If you want to start from that file to create your unikernel image, you could create a local package, as described in https://docs.ops.city/ops/packages#creating-a-custom-or-local-dev-packages. Or alternatively you could just edit the contents of the .ops/packages/nanovms/elasticsearch_8.2.0 folder, which is where ops pkg load builds the image from.

joshuaquek commented 2 years ago

Hi @francescolavra ! thank you for the quick response. I followed your advice on using the nightly -n flag and it seems to work!

Thank you!

joshuaquek commented 2 years ago

Sadly, my Kibana unikernel does not seem to work with the Elasticsearch unikernel due to a "licensing issue" (This is not related to Nanos / Ops but it is probably more elasticsearch related)

Browser:

Command Line (the left side is Kibana, while the right side is Elasticsearch):

Checking my License after calling http://0.0.0.0:9200/_license shows that I'm running an active basic license:

I'll probably need to search online why this issue is happening, and will continue to post my findings here.

joshuaquek commented 2 years ago

Have had some time recently to try running a Kibana unikernel on AWS.

Works locally on a laptop (as per what we saw earlier on ):

...but when I try deploying to the cloud, it successfully deploys, and it is reachable 👍🏼 However, there seems to be an error:

When I check the instance screenshot of the logs that are being outputted, it seems that this AWS-deployed Kibana unikernel is somehow unable to make an outbound TCP call to connect to my managed Elasticsearch instance, though my local laptop Kibana unikernel is able to. The screenshot of the Kibana unikernel running on AWS EC2 is below:

Any thoughts on this?

eyberg commented 2 years ago

that's a dns issue but if it's working locally with ops to your host at https://temp-cluster.es.ap-southeast-1.aws.found.io:9243/ i'd suspect it to be working on aws as well as no other changes are being made

fyi, you can get this console output via 'ops instance logs ' as well, if this is the same image as the one you have locally does a restart of the instance work at all?

eyberg commented 2 years ago

i took a quick look at this and I think the pkg is resolving dns fine as I was able to substitute the kibana node program with a simple resolv like so ->

eyberg@box:~/j$ ops pkg load -l nanovms/kibana_8.2.2 -c config.json
booting /home/eyberg/.ops/images/node ...
en1: assigned 10.0.2.15
64.62.249.4

eyberg@box:~/j$ cat config.json
{
  "Files": ["hi.js"]
}

eyberg@box:~/j$ cat hi.js
var dns = require('dns');
var w3 = dns.lookup('nanovms.com', function (err, addresses, family) {
  console.log(addresses);
});

by default we place in google's dns in /etc/hosts of 8.8.8.8 but you can sub that out with something else if it makes more sense by creating a local folder named 'etc' and a file of 'hosts' with whatever dns makes more sense then adjusting your config to:

{
  "Files": ["hi.js"],
  "Dirs": ["etc"]
}

perhaps using a nameserver used by found.io?

eyberg commented 2 years ago

https://github.com/nanovms/nanos/pull/1759

nanovms / ops

Elasticsearch DB as a package #1333