Approach - Githubissues

moxious commented 5 years ago

Some quick feedback. As an initial spike this all looks pretty reasonable.

We have some docs on our site that describe how Cloud VMs generally work for Neo4j here: https://neo4j.com/developer/neo4j-cloud-vms/

The important parts on approach is that I usually maintain a /etc/neo4j/neo4j.template file. Users can in the template specify certain parameters by $parameter_name. Most clouds have some kind of metadata facility where they can specify metadata on a VM. It's super-useful to have it possible to pick up cloud-specific metadata, turn that into an env var.

In this way then, when systemctl start neo4j happens, we have a pre-neo4j.sh script that checks the outer cloud environment, fetches the dynamic IP or whatever else needs to happen, populates that into env vars, and then the neo4j.conf file is prepared by basically using envsubst to do what the manual sed replacements are doing in this code spike.

If this approach sounds good, I can share some sample code for Google or AWS that shows how all of this is done. To do it for OCI would be 90% the same, the main difference would be changing endpoints for getting metadata about the cloud VMs. I also have a shell script I can share that does the "install neo4j" bits and sets up various niceties there. Usually I build VMs with packer so some adaptation would be necessary for the framework here, but it could mostly work the same, with the benefit that the resulting OCI product would be mostly documented.

cpoczatek commented 5 years ago

(Sorry, I missed this in my github spam)

In general I would be for that approach, and am a big proponent of gathering instance info at run time instead of passing in template variables. I looked for the .template/pre-neo.sh and (sensibly) they don't come with the basic package install. If you can send them I'd love to look them over. That being said these are a few things that might/might not change your opinion.

IPs are more static on OCI than on other clouds, e.g. there aren't different 'basic'/'reserved' (right name?) public IPs like on Azure where the 'basic' IP changes in a stop/start cycle.
Any scripts/files need to be baked into the TF templates, or be at least wget-able.
OCI has a metadata server [*, see example below], info relevant to HA like Availability/Fault domains are available but currently IPs aren't available. There are utilities that ship with OEL images to get the public ip on an instance, but I'm not sure that other OS images (eg Ubuntu/Centos) do, I need to check.

[*] example metadata from a random instance:

curl -L http://169.254.169.254/opc/v1/instance/ | jq 
{
  "availabilityDomain": "IYfK:US-ASHBURN-AD-1",
  "faultDomain": "FAULT-DOMAIN-3",
  "compartmentId": "ocid1.compartment.oc1..aaaaaaaauc4ys2qb3h3ysdehgo6pqk3tryse53rhl6jnekypzryn246v6xyq",
  "displayName": "dse-0",
  "id": "ocid1.instance.oc1.iad.abuwcljrkgp7vsrtypbj5a2ngzp7thuegcvvvplb4vmz2it4w75zt2756wxa",
  "image": "ocid1.image.oc1.iad.aaaaaaaa7keb3ok2deynxzsz7k5rondhuc7nt5vw6hf3q5xslyiepnqsi3aq",
  "metadata": {
    "ssh_authorized_keys": "ssh-rsa AAAAB3Nza...cut for brevity",
    "user_data": "IyEvdXNyL2Jpbi9lbnY...cut for brevity"
  },
  "region": "iad",
  "canonicalRegionName": "us-ashburn-1",
  "shape": "VM.Standard2.4",
  "state": "Running",
  "timeCreated": 1555282138873,
  "agentConfig": {
    "monitoringDisabled": false
  }
}

moxious commented 5 years ago

@cpoczatek

(I'm putting these files in public gists because they actually reside in private internal git repos and I can't share directly)

pre-neo4j.sh for AWS https://gist.github.com/moxious/1a55ba0541baacc5cf790e4b4942093d neo4j.template https://gist.github.com/moxious/b057f30c5209fdeef2278710b91db6b3

The neo4j.template file is pretty much a basic neo4j.conf with env var substitutions set up, so you'll see things like dbms.mode=$dbms_mode, providing the hook for envsubst to do its thing.

If IP addresses on OCI are static, that's quite a nice thing to have, but is this in general safe? I don't see how OCI could guarantee that for the long run. (For example, what if I start 100 VMs, eating up 100 of your static IP addresses, and then leave them stopped for months?)

If the internal metadata server can't provide the IP address, there are still other ways to do that easily with curl, like this: https://ifconfig.co/ or an equivalent FaaS service that can be very easily deployed.

As for the internal metadata server - you'll see in the pre-neo4j.sh I'm providing that it already uses AWS's internal metadata server, so swapping that out should be fairly straightforward. The basic logic is just to fetch whatever keys are there, and then export them to the environment so that envsubst gets them. The jq specifics will be different but as long as OCI's metadata is key/value, in the end it's the same.

cpoczatek commented 5 years ago

To answer you question: yes IPs are "static" and change only on instance/VNIC creation/attachment/destruction. This fits the general pattern OCI follows for committing resources, for example instance cores (cores not vCPUs) are pinned to physical cores or network bandwidth is reserved and not over-subscribed.

I think I understand enough to give this a try. I don't think using capital-T Tags is the best way to go, but I don't think that's a problem. A few questions/comments:

Am I understanding correctly that this exposes any "templated" config param (that isn't overridden in the shell) as a tag, which sets the correct config at deploy time, and is then update-able and the change is picked up at service restart?
What are the required tags->vars that don't have defaults? (~ line 63 in your gist) Is it just dbms.mode?
OCI has a per-resource limit of 10 free-form tags, and they're not in the instance metadata so you have make an SDK/CLI call which brings in groups and roles. But we can add arbitrary string/string or string/JSON KV pairs to either the instance metadata or extended_metadata in the templates which are curl-able. This links template variables -> params pretty cleanly, and they're update-able, but not viewable/editable in the console.

If this seems sensible I'll code up something mocking this and we can make the final call.

moxious commented 5 years ago

Question 1 - yes, subject to some limitations. Not every neo4j config item is templated as as $var in neo4j.template. The issue here is that putting it in as a $var obligates pre-neo4j.sh to pick a default value. We would not want to do this in cases like heap size, since that would get very complicated very quickly. So the core set of most valuable "configure from the outside" things are $vars in neo4j.template, but not everything.

Question 2 - none. Specifically: reasonable defaults are provided for everything by pre-neo4j.sh such that if you set no metadata whatsoever, you'll still get a working config. The default working config however I believe is dbms.mode=SINGLE, or a single instance deploy. For this reason, in order to deploy a clustered config, at a minimum you must set dbms.mode and initial_discovery_members, since otherwise you'll get 3x 1 single instance, and not 1x1 cluster.

Question 3 - you know more about OCI metadata than I do. If you understand the intent of configuring via instance metadata and there is a reasonable analogue to translate that to in OCI, that'd be the way I'd go. If in your judgment it's better to do that with kv pairs to instance metadata in the templates, I could go with that, it's then just a documentation item.

oracle-quickstart / oci-neo4j

Approach #2