Running for production - Githubissues

nadirvardar commented 6 years ago

I have a request, in order to run in production environment with a shell script, I have below scenario;

Run as;

./athenai run --location s3://aws-athena-query-results-nv1-us-east-1/ --database xyz "select * from tmz" --format csv --concurrent 1

in this scenario I only look for the Location in s3. No need to show any data, just

Location: s3://aws-athena-query-results-nv1-us-east-1/d59e7620-2c7d-47b7-ab70-2927acb37683.csv or s3://aws-athena-query-results-nv1-us-east-1/d59e7620-2c7d-47b7-ab70-2927acb37683.csv

I need this because, I'd like to use you tool as a datapipeline step, to generate the data and next step will pick the s3 location, convert the data to parquet and work on that ....

Is there flag or solution can be produced ?

Thank you.

skatsuta commented 6 years ago

Hi @nadirvardar, thank you for using this tool 😄 As long as I see your use case, I think a simple one-liner like below would be enough for it, assuming we can use grep and awk in your shell script.

sample.sh

#!/usr/bin/env bash

LOCATION=$(athenai run --silent 'SHOW DATABASES' | grep Location: | awk '{ print $2 }')
echo $LOCATION

Running with --silent option does not show Running query... progress message, which would be good for your use case, and then extract the output S3 location from stdout with grep and awk.

Running the above sample produces:

$ ./sample.sh
s3://aws-athenai-demo/88640913-b6f7-4ec0-8247-e9b522268322.txt

So you can use LOCATION variable for your subsequent pipelining, such as picking the S3 location and converting the data to another format.

Does it meet your need?

nadirvardar commented 6 years ago

This is awesome, thank you so much for taking time and working on this.

skatsuta / athenai

Running for production #53

sample.sh