Handle sparse data - Githubissues

There is a production case like this:

  case class TrainingExample(indices: List[Int],
                             data: List[Float],
                             label: Float,
                             weight: Float)

  object TestFeatureSpec {
    val featuresType: TensorFlowType[TrainingExample] = TensorFlowType[TrainingExample]
  }

...

  def convertToTrainingExample(sv: Seq[SparseVector[Float]]): TrainingExample = {
    val labelData = sv(0).data
    val label = labelData.head
    val weight = labelData.length match {
      case a if a == 2 => labelData(1)
      case _ => defaultWeight
    }
    TrainingExample(
      sv(1).index.toList,
      sv(1).data.toList,
      label,
      weight
    )
  }

...

    val features = extracted
      .featureValues[SparseVector[Float]]
      .map(sv => (sampler.getPartition(), convertToTrainingExample(sv)))
      .map { case (partition, example) =>
        (partition, TestFeatureSpec.featuresType.toExample(example))
      }
...

I guess there might be a problem with lists (indices, data), but can we handle this?

spotify / spotify-tensorflow

Handle sparse data #20