locationtech-labs / geopyspark

GeoTrellis for PySpark
Other
179 stars 59 forks source link

ValueError on tile_to_layout for old SpaceTime data #705

Open gauchm opened 5 years ago

gauchm commented 5 years ago

I create a RasterLayer of type SPACETIME like so:

temporal_projected_extent = gps.TemporalProjectedExtent(extent=extent, proj4=crs, instant=datetime.datetime(1955,1,4))
tile = gps.Tile.from_numpy_array(var_data_at_instant, no_data_value)
tiles = [(temporal_projected_extent, tile)]

rdd = spark_ctx.parallelize(tiles)
raster_layer = gps.RasterLayer.from_numpy_rdd(layer_type=gps.LayerType.SPACETIME, numpy_rdd=rdd)

When running

tiled_raster_layer = raster_layer.tile_to_layout(gps.LocalLayout(y, x))

I get an exception:

2019-03-10 17:05:43 ERROR Executor:91 - Exception in task 2.0 in stage 2.0 (TID 10)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\worker.py", line 376, in main
  File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\worker.py", line 371, in process
  File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 142, in dump_stream
    self._write_with_length(obj, stream)
  File "C:\Users\me\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 152, in _write_with_length
    serialized = self.dumps(obj)
  File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufserializer.py", line 75, in dumps
    return self._dumps(obj)
  File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufserializer.py", line 56, in _dumps
    return self.encoding_method(obj)
  File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufcodecs.py", line 650, in tuple_encoder
    tup.temporalProjectedExtent.CopyFrom(to_pb_temporal_projected_extent(obj[0]))
  File "C:\Users\me\AppData\Local\Programs\Python\Python36\lib\site-packages\geopyspark\geotrellis\protobufcodecs.py", line 553, in to_pb_temporal_projected_extent
    tpex.instant = _convert_to_unix_time(obj.instant)
ValueError: Value out of range: -473126400000
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452)
    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588)
    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

The problem seems to be that geopyspark converts the date to Milliseconds since 1970 (which is -473126400000), and this value is too large.

Running the same code on the same rdd but with instant e.g. 1980 works just fine.