locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
243 stars 46 forks source link

Error loading pipeline model #425

Closed courtney-layman closed 4 years ago

courtney-layman commented 4 years ago

When loading in a model model = PipelineModel.load('model/water_decision_tree'), I am getting the following error for PyRasterFrames version '0.8.4' and PySpark version '2.4.4'.

Screen Shot 2019-11-18 at 4 04 07 PM

vpipkt commented 4 years ago

pipeline had:

exploder = TileExploder()
noDataFilter = NoDataFilter() \
  .setInputCols(["ndwi"])
labelIndexer = StringIndexer() \
  .setInputCol("target") \
  .setOutputCol("indexedTarget")
assembler = VectorAssembler() \
  .setInputCols(['ndwi']) \
  .setOutputCol("features")
classifier = DecisionTreeClassifier() \
  .setLabelCol('indexedTarget') \
  .setFeaturesCol("features")
pipeline = Pipeline() \
  .setStages([exploder, noDataFilter, labelIndexer, assembler, classifier])
vpipkt commented 4 years ago

I have a branch with 2 failing unit tests confirming the problem, here: https://github.com/s22s/rasterframes/commit/f55088664e1f235c68aed1d7497d05ff8206b8c8

See this for possible cause. Will research more. https://github.com/apache/spark/blob/master/python/pyspark/ml/wrapper.py#L243-L245

metasim commented 4 years ago

:-o