Open teresayanaxs opened 8 years ago
mm... having a look around, like here 2. My data is in a directory structure in S3. How can I access it? — Qubole Data Service 1.0 documentation it sounds like you should be using s3n:// in the URL rather than s3:// | 2. My data is in a directory structure in S3. How can I access it? — Qubole Data Service 1.0 documen...2. My data is in a directory structure in S3. How can I access it? Map-Reduce and related frameworks like Hive, Pig and Presto can processfiles in S3 easily. | ||||||||
---|---|---|---|---|---|---|---|---|---|
View on docs.qubole.com | Preview by Yahoo | ||||||||
------------------------------------------------------- "Good judgment comes from experience.
On Wednesday, March 16, 2016 7:06 PM, teresayanaxs <notifications@github.com> wrote:
Hi,I am encountering this issue in amazon s3 (emr hive-1.0.0) using the json-serde 1.3.6 snapshot jar. I have done add jar in the hive console. CREATE EXTERNAL TABLE transactions_json( now string COMMENT 'from deserializer', orders array>,zip:string>> COMMENT 'from deserializer', page string COMMENT 'from deserializer', ticketlimit string COMMENT 'from deserializer', totalpages string COMMENT 'from deserializer', totaltickets string COMMENT 'from deserializer') PARTITIONED BY ( file_date string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://ny-bucket/emr/frontgate/transactions'and then I have a view to select from that table as followsCREATE VIEW view_transactions_raw AS select transactions_json.now, ransactions_json.page, transactions_json.ticketlimit, transactions_json.totalpages, transactions_json.totaltickets, order_table.odr.email as odr_email, order_table.odr.ordercreated as odr_ordercreated, order_table.odr.orderid as odr_orderid, order_table.odr.zip as odr_zip, ticket_table.tkt.barcode as tkt_barcode, ticket_table.tkt.datestart as tkt_datestart, ticket_table.tkt.datestop as tkt_datestop, ticket_table.tkt.eventid as tkt_eventid, ticket_table.tkt.eventname as tkt_eventname, transactions_json.file_date from default.transactions_json lateral view explode(transactions_json.orders) order_table as odr lateral view explode(order_table.odr.tickets) ticket_table as tkt When I do "select * from transactions_json", it works, but when I do "select * from view_transactions_raw", it gives me the following error Number of reduce tasks is set to 0 since there's no reduce operator java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) at org.apache.hadoop.fs.Path.(Path.java:134) at org.apache.hadoop.fs.Path.(Path.java:93) at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:202) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:95) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:190) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1602) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1363) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1176) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1003) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:993) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Job Submission failed with exception 'java.lang.IllegalArgumentException(Can not create a Path from an empty string)' However, exactly the same thing works on CDH5 hive-0.13.1. Please advise. Thank you so much.Teresa— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
Hi Roberto,
Thanks for your response but, It looks like s3n and s3a is not supported anymore in EMR 4.0.0.?
http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-plan-file-systems.html "Note The s3a protocol is not supported. We suggest you use s3 in place of s3a and s3n URI prefixes."
Do you have any advice?
Thank you. Teresa
Hi,
I am encountering this issue in amazon s3 (emr hive-1.0.0) using the json-serde 1.3.6 snapshot jar. I have done add jar in the hive console.
<code>
CREATE EXTERNAL TABLEtransactions_json
(now
string COMMENT 'from deserializer',orders
array<struct<email:string,first:string,last:string,ordercreated:string,orderid:string,tickets:array<struct<barcode:string,datedoors:string,datestart:string,datestop:string,eventid:string,eventname:string>>,zip:string>> COMMENT 'from deserializer',page
string COMMENT 'from deserializer',ticketlimit
string COMMENT 'from deserializer',totalpages
string COMMENT 'from deserializer',totaltickets
string COMMENT 'from deserializer') PARTITIONED BY (file_date
string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://ny-bucket/emr/transactions'</code>
and then I have a view to select from that table as follows
<code>
CREATE VIEWview_transactions_raw
AS selecttransactions_json
.now
,ransactions_json
.page
,transactions_json
.ticketlimit
,transactions_json
.totalpages
,transactions_json
.totaltickets
,order_table
.odr
.email asodr_email
,order_table
.odr
.ordercreated asodr_ordercreated
,order_table
.odr
.orderid asodr_orderid
,order_table
.odr
.zip asodr_zip
,ticket_table
.tkt
.barcode astkt_barcode
,ticket_table
.tkt
.datestart astkt_datestart
,ticket_table
.tkt
.datestop astkt_datestop
,ticket_table
.tkt
.eventid astkt_eventid
,ticket_table
.tkt
.eventname astkt_eventname
,transactions_json
.file_date
fromdefault
.transactions_json
lateral view explode(transactions_json
.orders
)order_table
asodr
lateral view explode(order_table
.odr
.tickets)ticket_table
astkt
</code>
When I do "select * from transactions_json", it works, but when I do "select * from view_transactions_raw", it gives me the following error
Number of reduce tasks is set to 0 since there's no reduce operator java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) at org.apache.hadoop.fs.Path.(Path.java:134)
at org.apache.hadoop.fs.Path.(Path.java:93)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:202)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:95)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:190)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1602)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1363)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1176)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1003)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:993)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'java.lang.IllegalArgumentException(Can not create a Path from an empty string)'
However, exactly the same thing works on CDH5 hive-0.13.1. Please advise. Thank you so much.
Teresa