qubole / qds-sdk-py

Python SDK for accessing Qubole Data Service
https://qubole.com
Apache License 2.0
51 stars 127 forks source link
python qubole sdk-python

Qubole Data Service Python SDK

.. image:: https://travis-ci.org/qubole/qds-sdk-py.svg?branch=master :target: https://travis-ci.org/qubole/qds-sdk-py :alt: Build Status

A Python module that provides the tools you need to authenticate with, and use the Qubole Data Service API.

Installation

From PyPI

The SDK is available on `PyPI <https://pypi.python.org/pypi/qds_sdk>`_.

::

    $ pip install qds-sdk

From source

This should place a command line utility qds.py somewhere in your path

::

$ which qds.py
/usr/bin/qds.py

CLI

qds.py allows running Hive, Hadoop, Pig, Presto and Shell commands against QDS. Users can run commands synchronously - or submit a command and check its status.

::

$ qds.py -h  # will print detailed usage

Examples:

  1. run a hive query and print the results

    ::

    $ qds.py --token 'xxyyzz' hivecmd run --query "show tables" $ qds.py --token 'xxyyzz' hivecmd run --script_location /tmp/myquery $ qds.py --token 'xxyyzz' hivecmd run --script_location s3://my-qubole-location/myquery

  2. pass in api token from bash environment variable

    ::

    $ export QDS_API_TOKEN=xxyyzz

  3. run the example hadoop command

    ::

    $ qds.py hadoopcmd run streaming -files 's3n://paid-qubole/HadoopAPIExamples/WordCountPython/mapper.py,s3n://paid-qubole/HadoopAPIExamples/WordCountPython/reducer.py' -mapper mapper.py -reducer reducer.py -numReduceTasks 1 -input 's3n://paid-qubole/default-datasets/gutenberg' -output 's3n://example.bucket.com/wcout'

  4. check the status of command # 12345678

    ::

    $ qds.py hivecmd check 12345678 {"status": "done", ... }

  5. If you are hitting api_url other than api.qubole.com, then you can pass it in command line as --url or set in as env variable

    ::

    $ qds.py --token 'xxyyzz' --url https://.qubole.com/api hivecmd ...

    or

    $ export QDS_API_URL=https://.qubole.com/api

SDK API

An example Python application needs to do the following:

  1. Set the api_token and api_url (if api_url other than api.qubole.com):

    ::

    from qds_sdk.qubole import Qubole

    Qubole.configure(api_token='ksbdvcwdkjn123423')

    or

    Qubole.configure(api_token='ksbdvcwdkjn123423', api_url='https://.qubole.com/api')

  2. Use the Command classes defined in commands.py to execute commands. To run Hive Command:

    ::

    from qds_sdk.commands import *

    hc=HiveCommand.create(query='show tables') print "Id: %s, Status: %s" % (str(hc.id), hc.status)

example/mr_1.py contains a Hadoop Streaming example

Reporting Bugs and Contributing Code

Where are the maintainers ?

Qubole was acquired. All the maintainers of this repo have moved on. Some of the employees founded ClearFeed <https://clearfeed.ai>_. Others are at big data teams in Microsoft, Amazon et al.