spotify / spotify-tensorflow

Provides Spotify-specific TensorFlow helpers
Apache License 2.0
124 stars 25 forks source link

Handle sparse data #20

Open ravwojdyla opened 6 years ago

ravwojdyla commented 6 years ago

There is a production case like this:

  case class TrainingExample(indices: List[Int],
                             data: List[Float],
                             label: Float,
                             weight: Float)

  object TestFeatureSpec {
    val featuresType: TensorFlowType[TrainingExample] = TensorFlowType[TrainingExample]
  }
...

  def convertToTrainingExample(sv: Seq[SparseVector[Float]]): TrainingExample = {
    val labelData = sv(0).data
    val label = labelData.head
    val weight = labelData.length match {
      case a if a == 2 => labelData(1)
      case _ => defaultWeight
    }
    TrainingExample(
      sv(1).index.toList,
      sv(1).data.toList,
      label,
      weight
    )
  }

...

    val features = extracted
      .featureValues[SparseVector[Float]]
      .map(sv => (sampler.getPartition(), convertToTrainingExample(sv)))
      .map { case (partition, example) =>
        (partition, TestFeatureSpec.featuresType.toExample(example))
      }
...

I guess there might be a problem with lists (indices, data), but can we handle this?

yonromai commented 6 years ago

@ravwojdyla I guess if we handle code to do Sparse -> Example, then we also need to provide code to do Example -> Sparse - probably both in Scala and Python.