weaviate / weaviate-infra

BSD 3-Clause "New" or "Revised" License
11 stars 2 forks source link

Add Spark-Analytics (Java) app to Helm chart #59

Open etiennedi opened 5 years ago

etiennedi commented 5 years ago

we're pushing docker images as part of https://github.com/SeMI-network/janus-spark-analytics/issues/3

We want to have them included in the chart if a spark flag is set somewhere.

The app depends on two configuration files, which should be mounted from a config map:

The app is mostly stateless, a simple Deployment should be sufficient. It'll receive very low traffic, so a single replica should be fine.

idcrosby commented 5 years ago

Regarding the second config, spark-cassandra.properties, this is a janus configuration. Should this override the default janus config we use, or is it additional configuration (i.e. can be pass both in?)

etiennedi commented 5 years ago

It is technically a janus configuration, but it's only consumed by the analytics app. We are using the analytics app to talk to the backends (Cassandra, ES) with Janusgraph libraries (that's one of the reasons why it has to be written in Java).

idcrosby commented 5 years ago

The only place I can find it currently being used is to be mounted in the Janus image within the compose file: https://github.com/semi-technologies/janus-spark-analytics/blob/70a676c5d1f8412de9203b47bfeefdd2e99a70d0/docker-compose.yml#L29

Is this still to be added to the analytics piece?

etiennedi commented 5 years ago

Good point, let me check. It is definitely used by the analytics app. Maybe I'm just copying it while building the Dockerfile and it works since there's only one version. I'll get back to you.

etiennedi commented 5 years ago

Yeah, I think that's the case, it happens to be part of the docker image, that's why it's working. The app loads it from the file system, as configured here: https://github.com/semi-technologies/janus-spark-analytics/blob/70a676c5d1f8412de9203b47bfeefdd2e99a70d0/analytics.yml#L8

etiennedi commented 5 years ago

By the way, the janus container in the docker-compose file (of the analytics app) is only used for testing. I need that to insert data into the graph, that the analytics app can then get out during the test run.

idcrosby commented 5 years ago

Ok, I missed that, thanks. I'll add it in.

idcrosby commented 5 years ago

@etiennedi how will this be used within the cluster? I guess it has an API, so I'll need to expose this with a service, is there an expected host/port which we want to run it on? (Ideally we would use port 80)

etiennedi commented 5 years ago

Correct, it has an http API, the port can be configured here: https://github.com/semi-technologies/janus-spark-analytics/blob/70a676c5d1f8412de9203b47bfeefdd2e99a70d0/analytics.yml#L4-L6

Port 80 should be fine (unless this Java stack somehow has an issue with priviliged ports?).

idcrosby commented 5 years ago

Ok good. Propose to name the service janus-spark-analytics to be explicit, this would become the domain name for anyone wanting to connect to it.... I could also use analytics as this is what the test app seems to use. Any preferance @etiennedi ?

etiennedi commented 5 years ago

I think janus-spark-analytics is better, in case we want to introduce other analytics backends in the future.