mojodna / osm-pds-pipelines

OSM PDS pipeline
https://quay.io/repository/mojodna/osm-pds-pipelines
ISC License
31 stars 4 forks source link

Amazon Athena documentation for OSM #20

Open missinglink opened 6 years ago

missinglink commented 6 years ago

Hi,

I tried to follow the documentation today to set up an Athena table to query the Openstreetmap data without success.

The docs suggest running the query:

CREATE EXTERNAL TABLE planet (
  id BIGINT,
  type STRING,
  tags MAP,
  lat DECIMAL(9,7),
  lon DECIMAL(10,7),
  nds ARRAY<STRUCT>,
  members ARRAY<STRUCT>,
  changeset BIGINT,
  timestamp TIMESTAMP,
  uid BIGINT,
  user STRING,
  version BIGINT
)

STORED AS ORCFILE
LOCATION 's3://osm-pds/planet/';

... however, this syntax seems to be out-of-date (first time Athena user here!), but following through the wizard I was able to generate the following query:

CREATE EXTERNAL TABLE IF NOT EXISTS osm.planet (
  `id` bigint,
  `type` string,
  `tags` map<string,string>,
  `lat` decimal,
  `lon` decimal,
  `nds` array<int>,
  `members` array<int>,
  `changeset` bigint,
  `timestamp` timestamp,
  `uid` bigint,
  `user` string,
  `version` bigint 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://osm-pds/planet/'
TBLPROPERTIES ('has_encrypted_data'='false');

now when I run a query such as:

SELECT * FROM osm.planet LIMIT 1;

.. I get an error:

Your query has the following error(s):

Internal error

This query ran against the "osm" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: c974aa5c-ca4d-41bb-b0b1-ff32d23676b8.

Hopefully, you could point me in the right direction to get this working?

missinglink commented 6 years ago

this table definition also resulted in an error:

CREATE EXTERNAL TABLE IF NOT EXISTS osm.planet (
  `id` bigint,
  `type` string,
  `tags` map<string,string>,
  `lat` decimal,
  `lon` decimal,
  `nds` array<int>,
  `members` array<int>,
  `changeset` bigint,
  `timestamp` timestamp,
  `uid` bigint,
  `user` string,
  `version` bigint 
)
STORED AS ORCFILE
LOCATION 's3://osm-pds/planet/';
SELECT * FROM osm.planet LIMIT 1;
Your query has the following error(s):

Internal error

This query ran against the "osm" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: a20c964f-e9e3-4a62-9599-467cb36422b4.
missinglink commented 6 years ago

agh, found this gist which contains working examples.

it looks like the docs are just a little out-of-date.

mojodna commented 6 years ago

/cc @jflasher for the next round of updates (it looks like something was treated as an HTML tag).

The correct schema definitions are:

planet (with an additional visible column for compatibility with the history schema; it's always true):

--
-- This will register the "planet" table within your AWS account
--
CREATE EXTERNAL TABLE planet (
  id BIGINT,
  type STRING,
  tags MAP<STRING,STRING>,
  lat DECIMAL(9,7),
  lon DECIMAL(10,7),
  nds ARRAY<STRUCT<ref: BIGINT>>,
  members ARRAY<STRUCT<type: STRING, ref: BIGINT, role: STRING>>,
  changeset BIGINT,
  timestamp TIMESTAMP,
  uid BIGINT,
  user STRING,
  version BIGINT,
  visible BOOLEAN
)
STORED AS ORCFILE
LOCATION 's3://osm-pds/planet/';

planet_history:

--
-- This will register the "planet_history" table within your AWS account
--
CREATE EXTERNAL TABLE planet_history (
    id BIGINT,
    type STRING,
    tags MAP<STRING,STRING>,
    lat DECIMAL(9,7),
    lon DECIMAL(10,7),
    nds ARRAY<STRUCT<ref: BIGINT>>,
    members ARRAY<STRUCT<type: STRING, ref: BIGINT, role: STRING>>,
    changeset BIGINT,
    timestamp TIMESTAMP,
    uid BIGINT,
    user STRING,
    version BIGINT,
    visible BOOLEAN
)
STORED AS ORCFILE
LOCATION 's3://osm-pds/planet-history/'
jflasher commented 6 years ago

Got this on my list of updates, thanks!