pplu / aws-sdk-perl

A community AWS SDK for Perl Programmers
Other
170 stars 94 forks source link

Add Athena Support? #176

Closed frioux closed 7 years ago

frioux commented 7 years ago

Athena support was recently exposed via the normal API. I am happy to submit a pull request but wasn't sure if you had some initial steps you take when a brand new service is added. If you can give me direction I'll do what I can, or if it's easier for you to just do it that's cool too :)

pplu commented 7 years ago

Hi!

Athena support is here: https://github.com/pplu/aws-sdk-perl/tree/release/0.33

Do you mind trying it out before it hitting CPAN? I've recently (this release) started opening a branch for the development of the next release (0.33), where I'll ultimately merge all changes that will go into 0.33, and finally merge to master. Updating the definitions is also done there. Maybe, since master is the most visible branch, my plan has backfired, hiding "what's going on" because master is the most visible branch. Has this been the case for you?

Thanks!

frioux commented 7 years ago

yeah totally! I'll get back to you shortly.

frioux commented 7 years ago

And yeah, I did try to see if Athena support was in progress but couldn't tell because I looked at master.

frioux commented 7 years ago

Ok I tried to run this but it errored:

perl -MPaws -MDevel::Dwarn -E'Paws->service("Athena", region => "us-east-1")->StartQueryExecution(QueryString => "SELECT 1", ResultConfiguration => { OutputLocation => "s3://sandbox.ziprecruiter.com/frew-test" })->$Dwarn'
clientRequestToken is null or empty

Trace begun at /home/frew/.plenv/versions/5.26.0/lib/perl5/site_perl/5.26.0/Paws/Net/JsonResponse.pm line 33
Paws::Net::JsonResponse::handle_response('Paws::Athena=HASH(0x5599a02b2e88)', 'Paws::Athena::StartQueryExecution=HASH(0x5599a05e6318)', 400, '{"__type":"InvalidRequestException","AthenaErrorCode":"INVALID_INPUT","ErrorCode":"INVALID_INPUT","Message":"clientRequestToken is null or empty"}', 'HASH(0x5599a0769f20)') called at /home/frew/.plenv/versions/5.26.0/lib/perl5/site_perl/5.26.0/Paws/Net/Caller.pm line 40
Paws::Net::Caller::caller_to_response('Paws::Net::Caller=HASH(0x55999f976950)', 'Paws::Athena=HASH(0x5599a02b2e88)', 'Paws::Athena::StartQueryExecution=HASH(0x5599a05e6318)', 400, '{"__type":"InvalidRequestException","AthenaErrorCode":"INVALID_INPUT","ErrorCode":"INVALID_INPUT","Message":"clientRequestToken is null or empty"}', 'HASH(0x5599a0769f20)') called at /home/frew/.plenv/versions/5.26.0/lib/perl5/site_perl/5.26.0/Paws/Net/RetryCallerRole.pm line 19
Paws::Net::RetryCallerRole::do_call('Paws::Net::Caller=HASH(0x55999f976950)', 'Paws::Athena=HASH(0x5599a02b2e88)', 'Paws::Athena::StartQueryExecution=HASH(0x5599a05e6318)') called at /home/frew/.plenv/versions/5.26.0/lib/perl5/site_perl/5.26.0/Paws/Athena.pm line 65
Paws::Athena::StartQueryExecution('Paws::Athena=HASH(0x5599a02b2e88)', 'QueryString', 'SELECT 1', 'ResultConfiguration', 'HASH(0x55999e775190)') called at -e line 1

Note that this, using the awscli, does work:

aws athena start-query-execution --query-string 'SELECT 1' --result-configuration OutputLocation='s3://sandbox.ziprecruiter.com/frew-test'
frioux commented 7 years ago

Ok I got it to work; basically ClientRequestToken is required, is a Str, and must 32 characters or more long, and I think they need to be unique or you'll get the same results back, so 12345678901234567890123456789012 works, once.

frioux commented 7 years ago

Note that the documentation says that the SDK (ie Paws) should be generating this token, but I would be fine either way.

pplu commented 7 years ago

Another feature for the TODO list :laughing:. It looks like a some APIs signal some parameters with an "idemopotencyToken" attribute set to true (others seem to have an "idempotency parameter", but don't signal it). ack-grep idempotencyToken botocore/botocore/data/ reports EC2, SSM, Athena and ServiceCatalog.

I think we can leave the generation to the user for now (unless you want to take a stab at it). I can think of two strategies for generating it from the SDK, the two back-compatible with who would specify their token in their call:

Please shout out if you want some help!

frioux commented 7 years ago

I think I'm good with the support that is exposed as is. I'm building a small CLI tool right now which will exercise more of the Athena API. I'll let you know (and share it) when it's done.

frioux commented 7 years ago

Works well. Feel free to take this example and include it with Paws:

#!/usr/bin/env perl

use 5.26.0;
use warnings;

use experimental 'signatures';

use Data::GUID 'guid_string';
use DateTime;
use Devel::Dwarn;
use Getopt::Long::Descriptive;
use Net::Amazon::S3;
use Paws;

my ($opt, $usage) = describe_options(
  '$0 %o <some-arg>',
  [ 'sql=s', "sql to run", { required => 1  } ],
  [ 'database=s', "db to run in (default is adhoc)", { default => 'adhoc'  } ],
  [ 's3-output-location=s',
      "S3 Prefix to store to " .
        "(default is s3://foobar/$ENV{USER}-test)",
      { default  => "s3://foobar/$ENV{USER}-test" }
  ],
  [ 'local-output-location=s',
    "Location to download s3 files to (default is '.')",
    { default  => "." }
  ],
  [],
  [ 'verbose|v',  "print extra stuff"            ],
  [ 'help',       "print usage message and exit", { shortcircuit => 1 } ],
);

print($usage->text), exit if $opt->help;

my $athena = Paws->service('Athena', region => 'us-east-1');

my $query = $athena->StartQueryExecution(
  QueryString => $opt->sql,
  ResultConfiguration => {
    OutputLocation => $opt->s3_output_location,
  },
  QueryExecutionContext => {
    Database => $opt->database,
  },
  ClientRequestToken => guid_string(),
);

my $status;
do {
  $status = $athena->GetQueryExecution(
    QueryExecutionId => $query->QueryExecutionId,
  );
  sleep 1;
} until _is_complete($status);

my $s = $status->QueryExecution->Status;
my $start = DateTime->from_epoch( epoch => $s->SubmissionDateTime );
my $end = DateTime->from_epoch( epoch => $s->CompletionDateTime );
warn sprintf <<'OUT', $s->State, $start, $end if $opt->verbose;
Query %s!
  started at %s
 finished at %s
OUT

if ($s->State eq 'FAILED') {
  warn $s->StateChangeReason . "\n";
  exit 1;
} elsif ($s->State eq 'CANCELLED') {
  warn "query cancelled\n";
  exit 0;
}

warn "results are at " .
  $status->QueryExecution->ResultConfiguration->OutputLocation . "\n"
  if $opt->verbose;

my $a = Paws::Credential::ProviderChain->new->selected_provider;

my $s3 = Net::Amazon::S3->new(
  aws_access_key_id     => $a->access_key,
  aws_secret_access_key => $a->secret_key,
);

my ($bucket_name, $key, $file) = parse_s3_url($status->QueryExecution->ResultConfiguration->OutputLocation);

my $bucket = $s3->bucket($bucket_name);
my $local = $opt->local_output_location . '/' . $file;

warn "downloading $key to $local\n" if $opt->verbose;

$bucket->get_key_filename( $key, 'GET', $local );

sub _is_complete ($s) {
  $s->QueryExecution->Status->State =~ m/^(?:succeeded|failed|cancelled)$/i
}

sub parse_s3_url ($url) {
  $url =~ s/^s3:\/\///;

  my ($bucket, $key) = split qr(/), $url, 2;

  my ($file) = ($key =~ m(.*?/?([^/]+)$));

  return ($bucket, $key, $file);
}