Open Alxe1 opened 2 years ago
Would be easier to address if you could post a runnable code snippet. Would you be able to post such snippet?
Would be easier to address if you could post a runnable code snippet. Would you be able to post such snippet?
conf = SparkConf().setAppName("test")
spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()
spark.conf.set(SparkDatasetConverter.PARENT_CACHE_DIR_URL_CONF, 'file://')
df = pd.DataFrame({'x': [0, 1, 2, 3], "y": [6, 2, 5, 7], "z": [0, 0, 1, 1]})
sdf = spark.createDataFrame(df)
vector_assembler = VectorAssembler(inputCols=["x", "y"], outputCol="features")
sdf = vector_assembler.transform(sdf)
sdf = sdf.select("features", "z")
sdf.show()
converter = make_spark_converter(sdf)
Don't have enough spark knowledge to give an accurate answer. Perhaps @WeichenXu123 can weigh in?
https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.ml.functions.vector_to_array.html I think in the documentation it's clear that you need to have Spark 3.0
I convert pyspark dataframe to two columns: one for feature column, it's a dense vector, and another is a label column. When I transform to tensorflow dataset using
make_spark_converter
, it raised an error:Does it not support pyspark < 3.0? But in the
setup.py
file I see it required 'pyspark>=2.1.0'. How to salve this problem?