rcongiu / Hive-JSON-Serde

Read - Write JSON SerDe for Apache Hive.
Other
733 stars 393 forks source link

java.lang.IllegalArgumentException: Can not create a Path from an empty string on aws s3 #140

Open teresayanaxs opened 8 years ago

teresayanaxs commented 8 years ago

Hi,

I am encountering this issue in amazon s3 (emr hive-1.0.0) using the json-serde 1.3.6 snapshot jar. I have done add jar in the hive console.

<code> CREATE EXTERNAL TABLE transactions_json( now string COMMENT 'from deserializer', orders array<struct<email:string,first:string,last:string,ordercreated:string,orderid:string,tickets:array<struct<barcode:string,datedoors:string,datestart:string,datestop:string,eventid:string,eventname:string>>,zip:string>> COMMENT 'from deserializer', page string COMMENT 'from deserializer', ticketlimit string COMMENT 'from deserializer', totalpages string COMMENT 'from deserializer', totaltickets string COMMENT 'from deserializer') PARTITIONED BY ( file_date string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://ny-bucket/emr/transactions' </code>

and then I have a view to select from that table as follows

<code> CREATE VIEW view_transactions_raw AS select transactions_json.now, ransactions_json.page, transactions_json.ticketlimit, transactions_json.totalpages, transactions_json.totaltickets, order_table.odr.email as odr_email, order_table.odr.ordercreated as odr_ordercreated, order_table.odr.orderid as odr_orderid, order_table.odr.zip as odr_zip, ticket_table.tkt.barcode as tkt_barcode, ticket_table.tkt.datestart as tkt_datestart, ticket_table.tkt.datestop as tkt_datestop, ticket_table.tkt.eventid as tkt_eventid, ticket_table.tkt.eventname as tkt_eventname, transactions_json.file_date from default.transactions_json lateral view explode(transactions_json.orders) order_table as odr lateral view explode(order_table.odr.tickets) ticket_table as tkt </code>

When I do "select * from transactions_json", it works, but when I do "select * from view_transactions_raw", it gives me the following error

Number of reduce tasks is set to 0 since there's no reduce operator java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) at org.apache.hadoop.fs.Path.(Path.java:134) at org.apache.hadoop.fs.Path.(Path.java:93) at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:202) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:95) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:190) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1602) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1363) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1176) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1003) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:993) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Job Submission failed with exception 'java.lang.IllegalArgumentException(Can not create a Path from an empty string)'

However, exactly the same thing works on CDH5 hive-0.13.1. Please advise. Thank you so much.

Teresa

rcongiu commented 8 years ago
mm... having a look around, like here 2. My data is in a directory structure in S3. How can I access it? — Qubole Data Service 1.0 documentation it sounds like you should be using s3n:// in the URL rather than s3://             2. My data is in a directory structure in S3. How can I access it? — Qubole Data Service 1.0 documen...2. My data is in a directory structure in S3. How can I access it? Map-Reduce and related frameworks like Hive, Pig and Presto can processfiles in S3 easily.
View on docs.qubole.com Preview by Yahoo
 

 ------------------------------------------------------- "Good judgment comes from experience.

Experience comes from bad judgment"

On Wednesday, March 16, 2016 7:06 PM, teresayanaxs <notifications@github.com> wrote:

Hi,I am encountering this issue in amazon s3 (emr hive-1.0.0) using the json-serde 1.3.6 snapshot jar. I have done add jar in the hive console. CREATE EXTERNAL TABLE transactions_json( now string COMMENT 'from deserializer', orders array>,zip:string>> COMMENT 'from deserializer', page string COMMENT 'from deserializer', ticketlimit string COMMENT 'from deserializer', totalpages string COMMENT 'from deserializer', totaltickets string COMMENT 'from deserializer') PARTITIONED BY ( file_date string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://ny-bucket/emr/frontgate/transactions'and then I have a view to select from that table as followsCREATE VIEW view_transactions_raw AS select transactions_json.now, ransactions_json.page, transactions_json.ticketlimit, transactions_json.totalpages, transactions_json.totaltickets, order_table.odr.email as odr_email, order_table.odr.ordercreated as odr_ordercreated, order_table.odr.orderid as odr_orderid, order_table.odr.zip as odr_zip, ticket_table.tkt.barcode as tkt_barcode, ticket_table.tkt.datestart as tkt_datestart, ticket_table.tkt.datestop as tkt_datestop, ticket_table.tkt.eventid as tkt_eventid, ticket_table.tkt.eventname as tkt_eventname, transactions_json.file_date from default.transactions_json lateral view explode(transactions_json.orders) order_table as odr lateral view explode(order_table.odr.tickets) ticket_table as tkt When I do "select * from transactions_json", it works, but when I do "select * from view_transactions_raw", it gives me the following error Number of reduce tasks is set to 0 since there's no reduce operator java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126) at org.apache.hadoop.fs.Path.(Path.java:134) at org.apache.hadoop.fs.Path.(Path.java:93) at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:202) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:128) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:95) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:190) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:429) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1602) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1363) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1176) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1003) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:993) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:201) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:153) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:364) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:712) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:631) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:570) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Job Submission failed with exception 'java.lang.IllegalArgumentException(Can not create a Path from an empty string)' However, exactly the same thing works on CDH5 hive-0.13.1. Please advise. Thank you so much.Teresa— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

teresayanaxs commented 8 years ago

Hi Roberto,

Thanks for your response but, It looks like s3n and s3a is not supported anymore in EMR 4.0.0.?

http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-plan-file-systems.html "Note The s3a protocol is not supported. We suggest you use s3 in place of s3a and s3n URI prefixes."

Do you have any advice?

Thank you. Teresa