case VectorType => {
val field = row.get(index)
field match {
case v: SparseVector => FloatListFeatureEncoder.encode(v.toDense.toArray.map(_.toFloat))
case v: DenseVector => FloatListFeatureEncoder.encode(v.toArray.map(_.toFloat))
case _ => throw new RuntimeException(s"Cannot convert $field to vector")
}
}
I found this code in your DefaultTfRecordRowEncoder.scala, explicitly converse a SparseVector to a DenseVector.
I have a 1000-dimentional feature vector in my DataFrame which has about 90 non-zero values. So this conversion make the size of tfrecord dataset very much larger than snappy.parquet in Spark.
I'm a little confused about the conversion.