snowch / ibm-analytics-engine-python

Python library for IBM Analytics Engine - https://console.bluemix.net/docs/services/AnalyticsEngine/index.html
Apache License 2.0
11 stars 2 forks source link

Provide API method to set SPARK_DIST_CLASSPATH #7

Open snowch opened 6 years ago

snowch commented 6 years ago

Steps to configure:


Example code for working with ambari api:

from future.standard_library import install_aliases
install_aliases()
from urllib.parse import urlparse

import json
vcap = json.load(open('./vcap.json'))

USER         = vcap['cluster']['user']
PASSWORD     = vcap['cluster']['password']
AMBARI_URL   = vcap['cluster']['service_endpoints']['ambari_console']
CLUSTER_ID   = vcap['cluster']['cluster_id']
CLUSTER_NAME = 'AnalyticsEngine'

url = urlparse(AMBARI_URL)

HOST = url.hostname
PORT = url.port
PROTOCOL = url.scheme

and

    echo 'Downloading configs.py ambari script'
    curl https://raw.githubusercontent.com/apache/ambari/9e93c476ddd8d4397f550062fd1645ac5422ed2e/ambari-server/src/main/resources/scripts/configs.py > configs.py

    echo 'Getting the latest spark2-env configuration'
    # grab the latest spark2-env configuration file
    ./configs.py -u ${AMBARI_USER} -p ${AMBARI_PASSWORD} -n ${CLUSTER_NAME} -s https --port ${AMBARI_PORT} -l ${AMBARI_HOST} -a get -c spark2-env -f spark2-env-content.json

    echo 'Current spark2-env configuartion:'
    cat spark2-env-content.json

    # append the spark-avro folder to the SPARK_DIST_CLASSPATH
    echo 'export SPARK_DIST_CLASSPATH=$SPARK_DIST_CLASSPATH:/home/clsadmin/spark-avro/*' >> spark2-env-content.json

    # save the changes back 
    ./configs.py -u ${AMBARI_USER} -p ${AMBARI_PASSWORD} -n ${CLUSTER_NAME} -s https --port ${AMBARI_PORT} -l ${AMBARI_HOST} -a set -c spark2-env -f spark2-env-content.json

    echo 'Uploaded new spark2-env configuartion:'
    cat spark2-env-content.json

    echo "stop and Start Services"
    curl -k -v --user $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By: ambari" -i -X PUT -d '{"RequestInfo": {"context": "Stop All Services via REST"}, "ServiceInfo": {"state":"INSTALLED"}}' https://$AMBARI_HOST:$AMBARI_PORT/api/v1/clusters/$CLUSTER_NAME/services

    curl -k -v --user $AMBARI_USER:$AMBARI_PASSWORD -H "X-Requested-By: ambari" -i -X PUT -d '{"RequestInfo": {"context": "Start All Services via REST"}, "ServiceInfo":{"state":"STARTED"}}' https://$AMBARI_HOST:$AMBARI_PORT/api/v1/clusters/$CLUSTER_NAME/services

# https://git.ng.bluemix.net/chris.snow/iae-spark-package-customization-example/raw/master/bootstrap/verify_ambari_service_status.py

Example file based approach for extracting env var definitions:

In [18]: with open('spark-env.sh', 'r') as f:
    ...:     for line in shlex.split(f.read()):
    ...:         var, eq, value = line.partition('=')
    ...:         if var == 'SPARK_DIST_CLASSPATH' and eq:
    ...:             print(var, value)
    ...:
SPARK_DIST_CLASSPATH /home/common/lib/scala/spark2/*:/home/common/lib/scala/common/*:/home/common/lib/dataconnectorCloudant/*:/home/common/lib/dataconnectorStocator/*:/home/common/lib/dataconnectorDb2/*:/home/common/lib/dataconnectorIdax/*
SPARK_DIST_CLASSPATH $SPARK_DIST_CLASSPATH:/some/otherdir/*\