Open echeipesh opened 4 years ago
I'm wondering, would it throw on the line 75?
In other words is it a combination of close
and abort
or it will still throw even without abort
?
Earlier we had no problems with it, since everyone had a dependnecy on the same SDK and the same org.apache.http
library.
Currently
RangeReader
implementations will clip read requests to known maximum file size:https://github.com/locationtech/geotrellis/blob/514e6fa0aeacd883080fde4df2b53d812013d255/util/src/main/scala/geotrellis/util/RangeReader.scala#L41-L42
In order to do this typically a
HEAD
request needs to be made. This is sometime problamatic on S3 because public requester-pays buckets often only allowGET
and notHEAD
requests.In this PR we tried to work around this restriction: https://github.com/locationtech/geotrellis/pull/3025
https://github.com/locationtech/geotrellis/blob/514e6fa0aeacd883080fde4df2b53d812013d255/s3/src/main/scala/geotrellis/store/s3/util/S3RangeReader.scala#L55-L60
by opening a get object stream and reading the length metadata. Followed by abort/close calls to avoid actually pulling down any data.
However, some combination of AWS SDK and Apache http lib can produce the following exception
Issue
This issue is to remove all usage of
RangeRader.totalLength
method and deprecate it for removal in next major version.Ultimately we do not need to know the total length of the file.
RangeReader
interface is used exclusively to read GeoTiffs (and possibly similar files). First few KB of such files are a header that will contain or point to table of offsets. Consequently when we're reading GeoTiff segments we're always basing the read ranges from the offset table and never have a chance to walk off the end of the file. One exception to this is performing the initial header read. In theory the file should never be smaller than that requests, but we should double check the behavior of GeoTiff reader if it is.As a side benefit we can remove one extra HEAD/GET request per file and avoid paying the AWS costs on that.