Lab tutorial notebook - default kafka port for local

planetf1 commented 5 years ago

Having deployed the 'lab' helm chart into a cloud deployment (IKS) I ran through the tutorials to note any adjustments/parameterization/defaults we need to add

initially this is to capture/document the changes, so that we can then better parameterize/default/document:

Some key info you need about your k8s deployment:

Ingress subdomain ie http://XXX.eu-gb.containers.appdomain.cloud

➜  charts git:(master) ✗ ibmcloud ks cluster get egeriadev | grep 'Ingress Subdomain'
Ingress Subdomain:              egeriadev.eu-gb.containers.appdomain.cloud

Information on deployed services (kubectl get services) ie

➜  charts git:(master) ✗ kubectl get services
NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
lab-cp-kafka                ClusterIP   172.21.6.89      <none>        9092/TCP            24m
lab-cp-kafka-headless       ClusterIP   None             <none>        9092/TCP            24m
lab-cp-zookeeper            ClusterIP   172.21.247.233   <none>        2181/TCP            24m
lab-cp-zookeeper-headless   ClusterIP   None             <none>        2888/TCP,3888/TCP   24m
lab-egeria-core-service     NodePort    172.21.188.153   <none>        8080:30080/TCP      24m
lab-egeria-dev-service      NodePort    172.21.38.199    <none>        8080:30082/TCP      24m
lab-egeria-lake-service     NodePort    172.21.146.21    <none>        8080:30081/TCP      24m
lab-jupyter-service         NodePort    172.21.91.206    <none>        8888:30888/TCP      24m
➜  charts git:(master) ✗

With these services the important info is the name, and then the second port (after :). All services of type 'NodePort' will be available on the node (the ingress subdomain) at that port number, ie 30888 for the jupyter service

To get started the notebook itself is typically accessed via http://hostname:30888

Managing services

corePlatformURL = "http://localhost:30080" dataLakePlatformURL = "http://localhost:30081" devPlatformURL = "http://localhost:30082"

instead of the original values of

corePlatformURL = "http://localhost:8080" dataLakePlatformURL = "http://localhost:8081" devPlatformURL = "http://localhost:8082"

it's worth noting (cc: @mandy-chessell ) that

serverConfig=response.json().get('omagserverConfig')
auditTrail=serverConfig.get('auditTrail')

print (" ")
print ("Audit Trail: ")

for x in range(len(auditTrail)): 
    print (auditTrail[x])

will hit an exception, since when configuring my server at least, no audit log entries were added

Later on in the notebook we also get failures in exercise 3 where we stop both cocoMDS1 and myOldServer. The reason for this (looking back) is that we actually failed to start the servers with errors like:

Starting server cocoMDS1 ...
POST http://egeriadev.eu-gb.containers.appdomain.cloud:30081/open-metadata/admin-services/users/garygeeke/servers/cocoMDS1/instance
Response: 
{
    "class": "SuccessMessageResponse",
    "relatedHTTPCode": 400,
    "exceptionClassName": "org.odpi.openmetadata.adminservices.ffdc.exception.OMAGConfigurationErrorException",
    "exceptionErrorMessage": "OMAG-ADMIN-400-018 OMAG server cocoMDS1 has been called with a configuration document that has no services configured",
    "exceptionSystemAction": "The requested server provides no function.",
    "exceptionUserAction": "Use the administration services to add configuration for OMAG services to the server's configuration document."
}

.. since no OMASs were configured

So the notebook looks ok for cloud, but fails due to OMAS changes.

cmgrote commented 5 years ago

With #1418 I'm suggesting getting rid of the externally-exposed Egeria ports -- since the objective of the Notebooks are to be able to interact with the environment that way, it presumably limits the attack surface of the chart if we leave those purely cluster-internal (?) We can then also use their k8s service names, making them more self-descriptive in nature (and all using :8080 as well, for further simplicity without needing to lookup or remember long port numbers)...

planetf1 commented 5 years ago

Configuring Metadata

Similarly - the ports need to be changed just after figure 1 similar to the previous notebook

ie corePlatformURL = "http://hostname:30080" dataLakePlatformURL = "http://hostname30081" devPlatformURL = "http://hostname:30082" edgePlatformBaseURL = "http://hostname:30083"

Observation - we do not yet have an 'edge' server defined in the k8s chart (will add)

Before 'access services' we have eventBusURLroot = "http://localhost:59092"

which is incorrect - should be without the http:// as this is a kafka endpoint

In fact for a cloud deployment here we likely need to use the INTERNAL kafka endpoint, since the communication is between the notebook server (where python runs) and kafka.

Let's look at that:

➜  python git:(master) ✗ kubectl describe service/lab-cp-kafka
Name:              lab-cp-kafka
Namespace:         egeria
Labels:            app=cp-kafka
                   chart=cp-kafka-0.1.0
                   heritage=Helm
                   release=lab
Annotations:       <none>
Selector:          app=cp-kafka,release=lab
Type:              ClusterIP
IP:                172.21.6.89
Port:              broker  9092/TCP
TargetPort:        9092/TCP
Endpoints:         172.30.193.176:9092
Session Affinity:  None
Events:            <none>

So here I will set it to: lab-cp-kafka:9092

Then going onto ODPi EGeria Server Configuration: adminPlatformURL = "http://localhost:8080"

For a hosted demo I would set to the same as corePlatformURL

When configuring cocoMDS1 and starting our OMASs I did notice

POST http://egeriadev.eu-gb.containers.appdomain.cloud:30080/open-metadata/admin-services/users/garygeeke/servers/cocoMDS1/access-services/community-profile
Response: 
{'relatedHTTPCode': 400, 'exceptionClassName': 'org.odpi.openmetadata.adminservices.ffdc.exception.OMAGConfigurationErrorException', 'exceptionErrorMessage': 'OMAG-ADMIN-400-004 Unable to configure server cocoMDS1 since access service community-profile is not registered in this OMAG Server Platform', 'exceptionSystemAction': 'The system is unable to configure the local server.', 'exceptionUserAction': 'Validate and correct the name of the access service URL marker.'}

I get the same for stewardship-action

For cocoMDS1 we miss data-privacy, it-infrastructure, project-management & community-profile

MDS3 is missing community-profile, data-science,stewardship-action,project-management MDS4 msises community-profile,data-science,

This is work in progress...

planetf1 commented 5 years ago

yes agree on service ports though let's leave change for now as trying to get something sorted for the LF conf. I'll summarise any changes/notes after finished testing. - I do agree in terms of using the internal names though as it's notebook-service only. So I will drop the reference to ingress except for the initial web entry point. But not change any of the service definitions at this point.

cmgrote commented 5 years ago

PR #1418 already drops the nodeports -- that's the only change to the service definition (name of each service remains the same: only change to your notes above is to use the servicename:8080 for each URL instead of localhost:nodeport...) In fact, testing-wise it would remain the same even if you happen to have nodeports around, so long as you're using servicename:8080 in the notebook.

planetf1 commented 5 years ago

understanding-cohorts:

As per above comment, the internal names are easier/more appropriate to use, so we now use: corePlatformURL = "http://lab-egeria-core-service:8080" dataLakePlatformURL = "http://lab-egeria-lake-service:8080" devPlatformURL = "http://lab-egeria-dev-service:8080"

planetf1 commented 5 years ago

I need stability in these charts for San Diego & use with other (remote) notebooks for k8s scenario so need to leave the PR on hold for now I think. Only a day prep left.

mandy-chessell commented 5 years ago

All of the OMASs are configured - although some are not available yet - this is deliberate - so that they are configured as soon as they are available.

There should always be an audit trail if some config is in place. Sounds like there is another problem with the environment as the config commands are failing.

Where the servers fail to start because no config is because the config failed - need to work out why

There will be 200 edge platforms - not just one so suggest do not add to base chart.

mandy-chessell commented 5 years ago

In particular, the comment that this exception is caused because the OMASs are not defined is incorrect

Starting server cocoMDS1 ...
POST http://egeriadev.eu-gb.containers.appdomain.cloud:30081/open-metadata/admin-services/users/garygeeke/servers/cocoMDS1/instance
Response: 
{
    "class": "SuccessMessageResponse",
    "relatedHTTPCode": 400,
    "exceptionClassName": "org.odpi.openmetadata.adminservices.ffdc.exception.OMAGConfigurationErrorException",
    "exceptionErrorMessage": "OMAG-ADMIN-400-018 OMAG server cocoMDS1 has been called with a configuration document that has no services configured",
    "exceptionSystemAction": "The requested server provides no function.",
    "exceptionUserAction": "Use the administration services to add configuration for OMAG services to the server's configuration document."
}

This messages occurs if no services are defined - so if the respository services are defined then the server will start.

mandy-chessell commented 5 years ago

FYI - I have updates to the notebooks in my workspace - particularly the asset management ones.

mandy-chessell commented 5 years ago

If the port names and host names are changed, how are they resolved on a local laptop?

cmgrote commented 5 years ago

If the port names and host names are changed, how are they resolved on a local laptop?

If using the same environment orchestration (k8s) on your local laptop, they'd be resolved in the same way. If you want to startup each instance yourself running natively on your laptop, this is where we'd need something like an environment variable that the notebook picks up to set the value for these Python variables (in our Helm chart we can feed in the k8s-specific hostnames; for a local environment outside k8s we could default these to localhost if we do not find any values for the environment variable). This approach will rely on some mechanism for an environment variable to be exposed up into the Jupyter Notebook, though, likely via Python somehow... (Hence needs some deeper investigation.)

It looks like the notebook can pickup any environment variables the container itself was run with (good new for us -- we shouldn't have to maintain our own image): they can be retrieved by using %env to see all of them or %env VARNAME to get the value of just one of them. (So you could use myVarName = %env ENV_VAR_NAME to setup a Python variable, so long as the environment variable actually exists... Not sure how to conditionally check if the env var is defined, tho?)

mandy-chessell commented 5 years ago

I am not sure we are on the same page - the notebooks need to run against server platforms executing outside of kubernetes. Working against kubernetes is an option.

It would be ok to use symbolic names and set up something like /etc/hosts. However, that would mean that the port names for each platform would need to be different so they can run on the same host.

planetf1 commented 5 years ago

As chris mentioned - the intent is to allow the notebooks to run across

localhost
docker compose
helm charts
and any manual override wanted for other configs

with as little difference/effort as possible

My thought would be to insert environment variables into the helm/compose configurations, and then use a construct in the script that uses those if set, and if not uses localhost. I see no problem in doing this beyond ensuring the code construct is as simple/clear as possible so it doesn't get in the way of manual overriding. This is of course what Chris wrote above :-)

Whether the ports/services differ depends on the deployment - with k8s we'll typically have the ports the same as each container is running on a different host (pod). Indeed compose is similar if the notebook is run within - unique ports only being needed when they get forwarded externally

It will be interesting to get feedback in San Diego as to which configuration is most likely to be used. Though the conf could be quite k8s focussed and techie so maybe they will be happy with any. If I had to choose I'd love to say k8s, but pragmatically wonder if compose is more popular. At least for first experience?

Having a good 'out of the box' experience is IMO vastly improved by getting the initial hostnames etc correct rather than have the user update

planetf1 commented 5 years ago

How about using a fragment like this. It keep is fairly clear as to how to override, yet keeps the code compact. It only needs one extra import, an can make use of variable insertion in k8s and compose:

#!/usr/bin/python3

import os;

print (os.environ.get('MYENDPOINT','localhost:8080'));

➜  ~ ./env.py
localhost:8080
➜  ~ export MYENDPOINT=mybigsuperserver:12345
➜  ~ ./env.py
mybigsuperserver:12345
➜  ~

planetf1 commented 5 years ago

I would be happy to make the change in the notebooks, though if you are working on them it may be easiest you do.

We could agree the environment variable names like corePlatformURL dataLakePlatformURL devPlatformURL

Basically the same names you are using in the notebook - except as env vars. Keeps it simple?

I can update the compose and chart definitions and test.

Note - I've been using EGERIA_ADMIN=EGERIA_BASE effectively in my tests - so do we need explicit admin (noted in the ramblings above)

planetf1 commented 5 years ago

making the change in the helm chart and testing ...

planetf1 commented 5 years ago

Ok so this works. I will add a PR for the helm chart (and then do the compose version)

This is my proposed notebook change

In the first code fragment add 'import os' (to get functions for handling environment)

Then change the hostname settings to values like:

corePlatformURL     = os.environ.get('corePlatformURL','http://localhost:8080') 
dataLakePlatformURL = os.environ.get('dataLakePlatformURL','http://localhost:8081') 
devPlatformURL      = os.environ.get('devPlatformURL','http://localhost:8082')

I then ran the managing servers notebook and can confirm this works :-)

We can add any other environment variables deemed necessary - for example if we decide we want to emulate the edge server (which would also need that server creating in charts/compose) or if we need the explicit adminServerURL which is set to localhost, but I didn't see why.

planetf1 commented 5 years ago

Added PR proposal above

planetf1 commented 5 years ago

Extended proposed update with docker-compose implementation too & tried a quick test with the notebook mod above - though note. Currently the docker compose version does not load up the notebooks into the server automatically. (will do via seperate PR)

planetf1 commented 5 years ago

I've made some doc updates

link to k8s alternative from docker-compose readme
link to docker-compose alternative from k8s chart readme
summarised options of how to run notebooks in readme under the labs.

This should help these different pieces hook together a little better for the reader, and we can then update our meetings page with a simple link to the tutorials

mandy-chessell commented 5 years ago

I like the environment variable idea - I have a lot of edits to the notebooks in my workspace and so I will make the changes and create PR tonight.

mandy-chessell commented 5 years ago

I have also updated adminPlatformURL to corePlatformURL.

What about the eventBusURL?

mandy-chessell commented 5 years ago

In the notebooks the import statement is also required ...

import os

corePlatformURL     = os.environ.get('corePlatformURL','http://localhost:8080') 
dataLakePlatformURL = os.environ.get('dataLakePlatformURL','http://localhost:8081') 
devPlatformURL      = os.environ.get('devPlatformURL','http://localhost:8082')

planetf1 commented 5 years ago

Yes - I did mention the import 'In the first code fragment add 'import os' (to get functions for handling environment)' - but the long ramblings and formatting made it easy to miss :-)

Good point on the 'eventBusURL' . I will use that env var name & add (ie 'kafka:9092') to both docker compose and helm chart. I changed it when testing, but forgot when proposing the change

mandy-chessell commented 5 years ago

I have changed the configuring servers notebook to be

eventBusURLroot = os.environ.get('eventBusURLroot', 'localhost:59092')

planetf1 commented 5 years ago

Ok I can use eventBusURLroot then - though it probably shouldn't really be described as a url (as it's just host/port)

cmgrote commented 5 years ago

Any reason to default the Kafka port to 59092? Default for Kafka is usually just 9092...

planetf1 commented 5 years ago

Updated PR with eventBusURLroot - if that changes or you find more env vars let me know. Can merge when ready. I suggest today - can always add another PR if we need more changes

Once the latest notebooks are in as well, plus the docs, I can test again to ensure everything hangs together for anyone following the link to the lab ie during the conf next week.

planetf1 commented 5 years ago

@cmgrote - it comes down to how we document people use the notebooks locally. The bonus of using default is you can just install kafka, zookeeper with 'brew' or following std instructions. The downside is it may clash with other app useage. Right now I updated the tutorial docs (in PR) to refer to the docker/k8s options. I've not tried to document a local setup

cmgrote commented 5 years ago

Understood -- just thinking even for a local setup it would probably be best to go with "as default as possible" with the assumption that folks aren't running lots of the components already. (If they are, then they're probably savvy enough to figure out how to override the non-defaults; but the reverse is less likely to be true (those that are not running any of the components will likely have more difficulty setting up non-default configurations))

mandy-chessell commented 5 years ago

We use port 59092 to aviod conflicts.

cmgrote commented 5 years ago

We use port 59092 to aviod conflicts.

I'm confused... In k8s / Docker we can startup Kafka automatically and put it into its own private IP address, so can use whatever port we like (within the cluster) without conflicting... I was assuming this 59092 was only for where users are running their own Kafka instance (?)

If that's the case, we'd need to assume they'd need to startup Kafka themselves (I didn't think we had it embedded somehow in OMAG, but maybe I'm wrong)? Hence my thinking was it would be easier for them to use the default that Kafka runs with than needing to dig around and figure out how to change Kafka's configuration on their local OS to run on a different port (or even more complex: install multiple co-existing instances of Kafka running on different ports and not conflicting?)

Even if I have Kafka installed and am using it for something else, it might still be nice to re-use that rather than be forced to run a new instance -- all we should be creating are some new topics, which a user can easily later delete if they like (we're not doing anything "invasive" to a pre-existing Kafka install)...

Just trying to keep it as simple as possible for new users -- hence suggesting defaults wherever possible 😄

planetf1 commented 5 years ago

This issue started out to track changes needed for cloud - I think these are mostly done.

There is a pending discussion around ports....

a) Default kafka port in the notebooks - I strongly agree with Chris. If I install kafka with homebrew or manually, by default I'll use 9092 (and 2181 for zookeeper). Whilst the 'easiest' way to get started is to use compose/k8s (IMO!), if a developer wants to experiments with clients and building egeria, the next step is local, and I think they know what they are doing at this point, defaults just makes things easier (even though I accept the clashing risk). In a container the ports can be the same in any case as chris says. Right now if they go with default many things seem to work until they get into cohort functionality and then things don't ....

planetf1 commented 5 years ago

@mandy-chessell I would prefer we went with default ports, I know you are concerned over clashes, but given on a local dev workstation we have some isolation through topic names I believe it is easier. Either way I suggest we agree on whether to change or not, action & close.

mandy-chessell commented 5 years ago

Hello @planetf1 - if default ports are better/easier then we should go with that.

planetf1 commented 5 years ago

PR submitted

odpi / egeria