oracle / oci-hdfs-connector

HDFS Connector for Oracle Cloud Infrastructure
https://cloud.oracle.com/cloud-infrastructure
Other
27 stars 26 forks source link

Closing stream and read fails, possible stale connection on upgrade from 3.2.1.3 to 3.3.0.7.0.1 with Jersey connector #54

Closed dmeibusch closed 3 years ago

dmeibusch commented 3 years ago

I've just upgraded from 3.2.1.3 to 3.3.0.7.0.1.

Apache Spark 3.1.2 Hadoop 2.7.4

I've seen our performance degrade significantly on accessing large files from Spark jobs (~ 1G compressed json files). With the default Apache Connector, the logs contained many partial read and retry errors. So I switched back to the Jersey HTTPConnector.

With this connector, the following warnings are in the log:

21/07/18 02:28:47 WARN ObjectStorageClient: getObject returns a stream, please make sure to close the stream to avoid any indefinite hangs
21/07/18 02:28:47 WARN ResponseHelper: Wrapping response stream into auto closeable stream, do disable this, pleaseuse ResponseHelper.shouldAutoCloseResponseInputStream(false)
21/07/18 02:32:54 WARN BmcDirectFSInputStream: Read failed, possibly a stale connection. Will re-attempt.
java.io.IOException: Total bytes processed (950272) does not match content-length (485179266)
y-chandra commented 3 years ago

@dmeibusch - Can you please try the same after switching back to the Jersey HTTPConnector and also disabling auto-close of streams using ResponseHelper.shouldAutoCloseResponseInputStream(false)?

dmeibusch commented 3 years ago

The warning messages above were after switching back to the Jersey connector. Should the oci-hdfs-connector code be setting ResponseHelper.shouldAutoCloseResponseInputStream(false) ? Or are you suggesting that I set that in my Spark job?

y-chandra commented 3 years ago

Please set ResponseHelper.shouldAutoCloseResponseInputStream(false) in your Spark job.

dmeibusch commented 3 years ago

That change would assume that I add oci-hdfs-connector as a compile-time dependency of my Spark Job code to access the ResponseHelper class. I shouldn't have to do that.

y-chandra commented 3 years ago

Please use the workaround for now. I will come up with a fix to disable auto-close using a config property in the next release.

xiaoyuyao commented 3 years ago

How does this work with Hive. I saw similar error on hive queries when switch to Jersey connector. WARN internal.ResponseHelper: Wrapping response stream into auto closeable stream, do disable this, pleaseuse ResponseHelper.shouldAutoCloseResponseInputStream(false)

y-chandra commented 3 years ago

@xiaoyuyao - This is a warning that comes from the Java SDK. The hdfs-connector internally uses the Java SDK to make API calls. For operations that return streams, the Java SDK automatically closes the streams to release the connection from the connection pool. There seems to be a typo in the warning and the correct statement should read :

Wrapping response stream into auto closeable stream, to disable this, please use ResponseHelper.shouldAutoCloseResponseInputStream(false)

You can access the ResponseHelper.shouldAutoCloseResponseInputStream(false) from your Hive code to disable the auto-close feature. More info on : https://github.com/oracle/oci-java-sdk/blob/master/ApacheConnector-README.md#switching-off-auto-close-of-streams

y-chandra commented 3 years ago

We've added a property in version 3.3.1.0.0.0 that lets you disable auto close of streams on full read. Please add the property fs.oci.object.autoclose.inputstream as false in core-site.xml. Please let us know if the fix works for you.

y-chandra commented 3 years ago

Since we've not received a response from you in a while, we'll close this one, please feel free to reopen if you face any issues.

dmeibusch commented 3 years ago

@y-chandra Apologies for not getting back to you. Appreciate the work on the connector, we use it heavily. We'll test this change when we next upgrade.