oracle / oci-hdfs-connector

HDFS Connector for Oracle Cloud Infrastructure
https://cloud.oracle.com/cloud-infrastructure
Other
27 stars 26 forks source link

Pyspark py4j.protocol.Py4JJavaError: An error occurred while calling o95.csv. #23

Closed durgaswaroop closed 4 years ago

durgaswaroop commented 4 years ago

I am trying to download files on Object store with a Pyspark application. I am getting the following error for that:

2020-01-21 06:27:53,860 WARN streaming.FileStreamSink: Error while looking for metadata directory
py4j.protocol.Py4JJavaError: An error occurred while calling o95.csv.
: java.io.IOException: Unable to fetch file status for: 0febff2985/companies
        at com.oracle.bmc.hdfs.store.BmcDataStore.getObjectMetadata(BmcDataStore.java:583)
        at com.oracle.bmc.hdfs.store.BmcDataStore.getFileStatus(BmcDataStore.java:508)
        at com.oracle.bmc.hdfs.BmcFilesystem.getFileStatus(BmcFilesystem.java:302)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
        at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:557)
        at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
        at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
        at scala.collection.immutable.List.flatMap(List.scala:355)
        at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:545)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:359)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
        at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:615)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.oracle.bmc.model.BmcException: (-1, null, false) Processing exception while communicating to: https://objectstorage.us-phoenix-1.oraclecloud.com/ (outbound opc-request-id: ASLKASJLAJDLADLAJSDALS)
        at com.oracle.bmc.http.internal.RestClient.convertToBmcException(RestClient.java:578)
        at com.oracle.bmc.http.internal.RestClient.head(RestClient.java:519)
        at com.oracle.bmc.objectstorage.ObjectStorageClient.lambda$null$38(ObjectStorageClient.java:970)
        at com.oracle.bmc.retrier.BmcGenericRetrier.lambda$execute$0(BmcGenericRetrier.java:50)
        at com.oracle.bmc.waiter.GenericWaiter.execute(GenericWaiter.java:54)
        at com.oracle.bmc.retrier.BmcGenericRetrier.execute(BmcGenericRetrier.java:46)
        at com.oracle.bmc.objectstorage.ObjectStorageClient.lambda$headObject$39(ObjectStorageClient.java:966)
        at com.oracle.bmc.retrier.BmcGenericRetrier.lambda$execute$0(BmcGenericRetrier.java:50)
        at com.oracle.bmc.waiter.GenericWaiter.execute(GenericWaiter.java:54)

at com.oracle.bmc.retrier.BmcGenericRetrier.execute(BmcGenericRetrier.java:46)
        at com.oracle.bmc.objectstorage.ObjectStorageClient.headObject(ObjectStorageClient.java:960)
        at com.oracle.bmc.hdfs.store.BmcDataStore.getObjectMetadata(BmcDataStore.java:553)
        ... 25 more
Caused by: shaded.oracle.javax.ws.rs.ProcessingException: org.apache.hadoop.fs.FsUrlConnection cannot be cast to java.net.HttpURLConnection
        at shaded.oracle.org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:287)
        at shaded.oracle.org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:753)
        at shaded.oracle.org.glassfish.jersey.internal.Errors.process(Errors.java:316)
        at shaded.oracle.org.glassfish.jersey.internal.Errors.process(Errors.java:298)
        at shaded.oracle.org.glassfish.jersey.internal.Errors.process(Errors.java:229)
        at shaded.oracle.org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:414)
        at shaded.oracle.org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:752)
        at shaded.oracle.org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:419)
        at shaded.oracle.org.glassfish.jersey.client.JerseyInvocation$Builder.head(JerseyInvocation.java:383)
        at com.oracle.bmc.http.internal.ForwardingInvocationBuilder.head(ForwardingInvocationBuilder.java:186)
        at com.oracle.bmc.http.internal.RestClient.head(RestClient.java:516)
        ... 35 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.fs.FsUrlConnection cannot be cast to java.net.HttpURLConnection
        at shaded.oracle.org.glassfish.jersey.client.HttpUrlConnectorProvider$DefaultConnectionFactory.getConnection(HttpUrlConnectorProvider.java:300)
        at shaded.oracle.org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:335)
        at shaded.oracle.org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:282)
        at shaded.oracle.org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:278)
        ... 45 more

Please Let me know what could be causing this issue.

mricken commented 4 years ago

Hi @durgaswaroop , can you please let us know what version of the OCI HDFS connector you are using? Can you please give us code to help us reproduce this issue?

durgaswaroop commented 4 years ago

Hi. Sorry, that issue was resolved. The issue was because of incompatible versions of the oci-hdfs jar. Looks like the connector supports version 2.9 but I had some 3.x in my classpath. That caused the issue. This seems to be an issue only with Pyspark though as the Scala equivalent worked fine.

jodoglevy commented 4 years ago

Closing this, as it seems the issue is resolved. If I'm misunderstanding, please feel free to reopen