tensorflow / ecosystem

Integration of TensorFlow with other open-source frameworks
Apache License 2.0
1.37k stars 392 forks source link

Why convert SparseVector to DenseVector in your DefaultTfRecordRowEncoder.scala? #146

Open NeilRon opened 4 years ago

NeilRon commented 4 years ago

     case VectorType => {
        val field = row.get(index)
        field match {
          case v: SparseVector => FloatListFeatureEncoder.encode(v.toDense.toArray.map(_.toFloat))
          case v: DenseVector => FloatListFeatureEncoder.encode(v.toArray.map(_.toFloat))
          case _ => throw new RuntimeException(s"Cannot convert $field to vector")
        }
      }

I found this code in your DefaultTfRecordRowEncoder.scala, explicitly converse a SparseVector to a DenseVector.

I have a 1000-dimentional feature vector in my DataFrame which has about 90 non-zero values. So this conversion make the size of tfrecord dataset very much larger than snappy.parquet in Spark.

I'm a little confused about the conversion.