Closed echeipesh closed 4 years ago
Actually this seems pretty straight forward:
The s3Client
gets used here when making a layerWriter
:
Which ultimately a is a field on a class extended by S3SparkLayerProvider
:
https://github.com/locationtech/geotrellis/blob/3609bf707d811960b44f6f9eed0cafbf7f7099cd/s3/src/main/scala/geotrellis/store/s3/S3CollectionLayerProvider.scala#L34
The knee-jerk reaction is to make the provider Serializable
but its slightly wrong to keep pulling more classes into closure. What what field does is ensure that every call to provider re-uses the same client. Which is a good thing. However, looking at it deeper (separate issue really):
... we see that S3ClientProduce
already does the same thing:
.. gets used by default which will cache. But if we user uses set
:
he's providing a function which may or may not have the same caching behavior depending on its implementation. Thats kind of non-obvious consequence here.
This is issue against master. When trying to write a layer to S3 with:
The following exception will be thrown:
For clarity the serialization stack is:
The first problem is that
s3Client
caused a serialization. I'm unclear howS3SparkLayerProvider
gets pulled into this serialization chain however.