linkedin / spark-tfrecord

Read and write Tensorflow TFRecord data from Apache Spark.
BSD 2-Clause "Simplified" License
290 stars 57 forks source link

Do we support spark dataframe straightly convert to tensorflow-java Tensor like TFloat32 TFloat64 or Operand[T <: TNumber] #53

Open mullerhai opened 2 years ago

mullerhai commented 2 years ago

HI: spark-tfrecord is great project ,but now I only know how to use spark read or write tfrecord file with dataframe ,In pregress We also need dataframe straightly convert to tensorflow-java Tensor like TFloat32 TFloat64 or Operand[T <: TNumber] generate tensor data for tensorflow model input train data like spark org.apache.spark.ml.linalg.Vector .

junshi15 commented 2 years ago

I don't understand your use case. Can you elaborate on how you plan to use Spark-TFRecord with Tensorflow-Java?

Spark-TFRecord is designed as a Spark data source, i.e. it handles data format conversion between TFRecord and Spark Dataframe, which happens during read/write operation. Once you read in the data, you can process it as a regular Dataframe. If I understand correctly, your request has nothing to do with TFRecord, you could read in a dataset in Avro, Parquet or CSV format, then you want to convert the Dataframe to tensorflow-java format in memory (instead of storing it in TFRecord format in disk)? This is out of scope for Spark-TFRecord.