This would allow one to query CSV/Parquet files stored in S3 and GCS.
For example:
query a different bucket with the same credentials as provided by in the config:
$ curl -H "Content-Type: application/json" http://127.0.0.1:8080/q -d@- <<EOF
{"query": "create external table test_aws stored as parquet location 's3://seafowl-public/tutorial/trase-supply-chains.parquet'; select * from staging.test_aws limit 1"}
EOF
{"commodity":"CORN","country_of_import":"CANADA","country_of_import_trase_id":"CA","country_of_production":"ARGENTINA","economic_bloc":"CANADA","exporter":"RONALB S R L","exporter_group":"RONALB S R L","exporter_group_id":63685,"exporter_id":39201,"exporter_trase_id":"AR-TRADER-3064104720","flow_id":401692173,"fob":24200.0,"importer":"RONALB S R L","is_domestic":"0","port":"ROSARIO","product_type":"CORN GRAINS","region_production_1_type":"COUNTRY","region_production_2":"SANTA FE","region_production_2_level":2,"region_production_2_trase_id":"AR-82","region_production_2_type":"PROVINCE","row_number":1,"scale":"SUBNATIONAL","version":"0.2.2","volume":50.0,"year":2018.0}
query with new credential options (or just in another region):
$ curl -H "Content-Type: application/json" http://127.0.0.1:8080/q -d@- <<EOF
{"query": "create external table test_aws_options stored as parquet options ('access_key_id' '*******', 'secret_access_key' '*************', 'region' 'eu-west-3') location 's3://splitgraph-athena-test/supply-chains/supply-chains.parquet'; select * from staging.test_aws_options limit 1"}
EOF
{"commodity":"CORN","country_of_import":"CANADA","country_of_import_trase_id":"CA","country_of_production":"ARGENTINA","economic_bloc":"CANADA","exporter":"RONALB S R L","exporter_group":"RONALB S R L","exporter_group_id":63685,"exporter_id":39201,"exporter_trase_id":"AR-TRADER-3064104720","flow_id":401692173,"fob":24200.0,"importer":"RONALB S R L","is_domestic":"0","port":"ROSARIO","product_type":"CORN GRAINS","region_production_1_type":"COUNTRY","region_production_2":"SANTA FE","region_production_2_level":2,"region_production_2_trase_id":"AR-82","region_production_2_type":"PROVINCE","row_number":1,"scale":"SUBNATIONAL","version":"0.2.2","volume":50.0,"year":2018.0}
GCS example (tested on a GCP VM, so the creds where automatically provided)
$ curl -v -H "Content-Type: application/json" http://127.0.0.1:8080/q -d@- <<EOF
{"query": "create external table test_gcs stored as parquet location 'gs://splitgraph-staging/tweets.parquet'; select * from staging.test_gcs limit 1"}
EOF
{"id":877940643162578944,"link":"https://www.twitter.com/AdamKinzinger/statuses/877940643162578944","screen_name":"AdamKinzinger","source":"Twitter for iPad","text":"Before we blame anyone else for the tone of today's politics, we must look to ourselves. I hope you'll join me. https://t.co/O6WWDYloKl","time":"2017-06-22T13:25:35","user_id":18004222}
This would allow one to query CSV/Parquet files stored in S3 and GCS.
For example:
query a different bucket with the same credentials as provided by in the config:
query with new credential options (or just in another region):
GCS example (tested on a GCP VM, so the creds where automatically provided)
Closes #256