Closed dmeibusch closed 3 years ago
@dmeibusch - Can you please try the same after switching back to the Jersey HTTPConnector and also disabling auto-close of streams using ResponseHelper.shouldAutoCloseResponseInputStream(false)
?
The warning messages above were after switching back to the Jersey connector.
Should the oci-hdfs-connector
code be setting ResponseHelper.shouldAutoCloseResponseInputStream(false)
? Or are you suggesting that I set that in my Spark job?
Please set ResponseHelper.shouldAutoCloseResponseInputStream(false)
in your Spark job.
That change would assume that I add oci-hdfs-connector
as a compile-time dependency of my Spark Job code to access the ResponseHelper
class. I shouldn't have to do that.
Please use the workaround for now. I will come up with a fix to disable auto-close using a config property in the next release.
How does this work with Hive. I saw similar error on hive queries when switch to Jersey connector. WARN internal.ResponseHelper: Wrapping response stream into auto closeable stream, do disable this, pleaseuse ResponseHelper.shouldAutoCloseResponseInputStream(false)
@xiaoyuyao - This is a warning that comes from the Java SDK. The hdfs-connector internally uses the Java SDK to make API calls. For operations that return streams, the Java SDK automatically closes the streams to release the connection from the connection pool. There seems to be a typo in the warning and the correct statement should read :
Wrapping response stream into auto closeable stream, to disable this, please use ResponseHelper.shouldAutoCloseResponseInputStream(false)
You can access the ResponseHelper.shouldAutoCloseResponseInputStream(false)
from your Hive code to disable the auto-close feature.
More info on : https://github.com/oracle/oci-java-sdk/blob/master/ApacheConnector-README.md#switching-off-auto-close-of-streams
We've added a property in version 3.3.1.0.0.0
that lets you disable auto close of streams on full read. Please add the property fs.oci.object.autoclose.inputstream
as false
in core-site.xml
. Please let us know if the fix works for you.
Since we've not received a response from you in a while, we'll close this one, please feel free to reopen if you face any issues.
@y-chandra Apologies for not getting back to you. Appreciate the work on the connector, we use it heavily. We'll test this change when we next upgrade.
I've just upgraded from 3.2.1.3 to 3.3.0.7.0.1.
Apache Spark 3.1.2 Hadoop 2.7.4
I've seen our performance degrade significantly on accessing large files from Spark jobs (~ 1G compressed json files). With the default Apache Connector, the logs contained many partial read and retry errors. So I switched back to the Jersey HTTPConnector.
With this connector, the following warnings are in the log: