Open se162xg opened 4 years ago
VectorAssembler is a transformer that combines a given list of columns into a single vector column.(Feature Vector)
from pyspark.ml.feature import VectorAssembler
vec_assembler = VectorAssembler(inputCols=['age', 'height'], outputCol='features')
new_df = vec_assembler.transform(df)
DataFrame[age: bigint, height: bigint, name: string, height_age: vector]
age,height,name,features
5 ,80,Alice, [80.0,5.0]
10 ,80,Alice,[80.0,10.0]
*StringType is not supported
csv file on HDFS
csv file on local disk