maxpumperla / elephas

Distributed Deep learning with Keras & Spark
http://maxpumperla.com/elephas/
MIT License
1.57k stars 312 forks source link

adapter.py "/usr/bin/anaconda/lib/python2.7/site-packages/elephas/ml/adapter.py" #81

Closed ankushreddy closed 6 years ago

ankushreddy commented 6 years ago

Hi Team,

when I try to fit the model it is giving me an error saying that TypeError: Cannot convert type <class 'pyspark.ml.linalg.DenseVector'> into Vector

  from pyspark.mllib.evaluation import MulticlassMetrics
  fitted_pipeline = pipeline.fit(final_train) # Fit model to data
  prediction = fitted_pipeline.transform(final_train) # Evaluate on train data.
  # prediction = fitted_pipeline.transform(test_df) # <-- The same code evaluates test data.
  pnl = prediction.select("index_category", "prediction")
  pnl.show(100)

final_train.printSchema() root |-- category: string (nullable = true) |-- features: vector (nullable = true)

Then it is giving me an error saying that. log.txt

sample df.

The result of indexing and scaling. Each transformation adds new columns to the data frame: +--------+--------------------+--------------+--------------------+ |category| features|index_category| scaled_features| +--------+--------------------+--------------+--------------------+ | 3|[238.0,238.0,238....| 1.0|[0.43949125844258...| | 1|[29.0,25.0,140.0,...| 0.0|[-4.5706653137731...| | 1|[255.0,255.0,255....| 0.0|[0.84701595570415...| | 10|[251.0,251.0,251....| 7.0|[0.75112779164260...| | 1|[192.0,195.0,196....| 0.0|[-0.6632226282651...| | 3|[218.0,218.0,217....| 1.0|[-0.0399495618651...| | 3|[255.0,255.0,255....| 1.0|[0.84701595570415...| | 4|[255.0,255.0,255....| 8.0|[0.84701595570415...| | 1|[221.0,221.0,221....| 0.0|[0.03196656118101...| | 10|[225.0,225.0,225....| 7.0|[0.12785472524256...| +--------+--------------------+--------------+--------------------+

maxpumperla commented 6 years ago

this issue has been fixed on master (note that this still worked in spark 1.x, but needed an adapter for 2.x)