tensorflow / java

Java bindings for TensorFlow
Apache License 2.0
788 stars 195 forks source link

How to use TfRecordDataset DatasetToTfRecord tf.io.tfRecordReader #452

Open mullerhai opened 2 years ago

mullerhai commented 2 years ago

tensorflow-java 0.4 spark 3.1 java 11

Hi :
Now I use tensorflow-java to read tfrecord file ,but can not get the data, and our not have example for it ,the TfRecordDataset DatasetToTfRecord tf.io.tfRecordReader java class have not same api like python ,could we give some example for how to use them. thank


    import org.tensorflow.{Operand, Session,EagerSession}
    import org.tensorflow.op.Ops
    import org.tensorflow.op.data.TfRecordDataset
    import org.tensorflow.op.data.{DatasetToTfRecord, TfRecordDataset}
    val session = EagerSession.create
    val tf = Ops.create(session)
    val  scope = tf.scope()
//    val fileName  =tf.constant( "/Users/zhanghaining/Downloads/tfrecord-kk2-test/")
    val fileName = tf.constant("/Users/zhanghaining/Downloads/BigDL/spark/dl/src/test/resources/tf/mnist_train.tfrecord")
    val compress = tf.constant("")
    val bufferSize = tf.constant(0l)
    val recordDataSet = TfRecordDataset.create(scope,fileName,compress,bufferSize)

    val record = DatasetToTfRecord.create(scope, recordDataSet,fileName,compress)

    val reader =  tf.io.tfRecordReader()

    println(record.op().name() )
    println(record.op().`type`())
    println(recordDataSet.op().numOutputs() )
    println(recordDataSet.asOutput().dataType())
mullerhai commented 2 years ago

c++ api demo

 std::unique_ptr<tensorflow::RandomAccessFile> file;
  auto tf_status = tensorflow::Env::Default()->NewRandomAccessFile(
      cc->InputSidePackets().Tag(kTFRecordPath).Get<std::string>(), &file);
  RET_CHECK(tf_status.ok())
      << "Failed to open tfrecord file: " << tf_status.ToString();
  tensorflow::io::RecordReader reader(file.get(),
                                      tensorflow::io::RecordReaderOptions());
karllessard commented 2 years ago

Hi @mullerhai ,

Is your goal to iterate through that dataset? If so, you need to create an iterator (e.g. by calling tf.data.makeIterator). Also in your example here, the DatasetToTfRecord is writing to the same file as the dataset you've loaded so I'm not sure what is the expected behavior here, you should try writing to a different file.

If you don't mind adding org.tensorflow:tensorflow-framework to your dependencies, we do have utilities to simplify the usage of dataset, take a look at this one. You can then iterate through the element of the dataset in eager mode like this :

    Dataset dataset = Dataset.tfRecordDataset(tf, "yourfile.tfrecord", "", 0L).batchSize(10);
    for (List<Operand<?>> components : dataset) {
         Operand<?> featureBatch = components.get(0);
         Operand<?> labelBatch = components.get(1);

         ... operate on the batches directly
    }

Eager mode tends to be slow though so if you can provide more details of what is your specific use cases, maybe we can give you better examples on how to do it.

mullerhai commented 2 years ago
    for (List<Operand<?>> components : dataset) {
         Operand<?> featureBatch = components.get(0);
         Operand<?> labelBatch = components.get(1);

         ... operate on the batches directly
    }

Great ,Thanks , but also I want to know how to convert Dataset to ByteNdArray ,or tfrecord to ByteNdArray,or convert Dataset to example ->org.tensorflow.example.example.{Example, SequenceExample}, Because of I need like this code style

NdArrays.wrap(Shape.of(dimSizes: _*), DataBuffers.of(bytes, true, false))

to make tensor for model train

karllessard commented 2 years ago

Maybe you can do this via parseExampleDataset? There are also a bunch of utilities for parsing examples in the IO package, like this one.

mullerhai commented 2 years ago

Hi @mullerhai ,

Is your goal to iterate through that dataset? If so, you need to create an iterator (e.g. by calling tf.data.makeIterator). Also in your example here, the DatasetToTfRecord is writing to the same file as the dataset you've loaded so I'm not sure what is the expected behavior here, you should try writing to a different file.

If you don't mind adding org.tensorflow:tensorflow-framework to your dependencies, we do have utilities to simplify the usage of dataset, take a look at this one. You can then iterate through the element of the dataset in eager mode like this :

    Dataset dataset = Dataset.tfRecordDataset(tf, "yourfile.tfrecord", "", 0L).batchSize(10);
    for (List<Operand<?>> components : dataset) {
         Operand<?> featureBatch = components.get(0);
         Operand<?> labelBatch = components.get(1);

         ... operate on the batches directly
    }

Eager mode tends to be slow though so if you can provide more details of what is your specific use cases, maybe we can give you better examples on how to do it.

in tensorflow-java 0.5.0-SNAPSHOT , EagerSession model, iter the element in dataset ,I find the element class type is OptionalGetValue or some type, I want to print the real value ,but failed

mullerhai commented 2 years ago

parseExampleDataset

    val fp = tf.constant("/Volumes/Pink4T/transfer/code/github/stanford-tensorflow-tutorials/2017/data/friday.tfrecord")
    val compress = tf.constant("")
    val bufferSize = tf.constant(0l)
    val datazs  =tf.data.tfRecordDataset( fileNamec, compress, bufferSize)
    println(datazs.asTensor())

I get the error: No tensor type has been registered for data type DT_VARIANT

karllessard commented 2 years ago

We don't map (yet) DT_VARIANT tensors in the Java space. Can you please provide the full stacktrace? I want to see where such tensor is being accessed from the JVM.

albertoandreottiATgmail commented 2 years ago

Hi @mullerhai ,

Is your goal to iterate through that dataset? If so, you need to create an iterator (e.g. by calling tf.data.makeIterator). Also in your example here, the DatasetToTfRecord is writing to the same file as the dataset you've loaded so I'm not sure what is the expected behavior here, you should try writing to a different file.

If you don't mind adding org.tensorflow:tensorflow-framework to your dependencies, we do have utilities to simplify the usage of dataset, take a look at this one. You can then iterate through the element of the dataset in eager mode like this :

    Dataset dataset = Dataset.tfRecordDataset(tf, "yourfile.tfrecord", "", 0L).batchSize(10);
    for (List<Operand<?>> components : dataset) {
         Operand<?> featureBatch = components.get(0);
         Operand<?> labelBatch = components.get(1);

         ... operate on the batches directly
    }

Eager mode tends to be slow though so if you can provide more details of what is your specific use cases, maybe we can give you better examples on how to do it.

Hello, is there any way in which you could run this code outside eager mode? I need to access the binary representation of the example to hit a ParseExample node within a graph.

thanks!

mullerhai commented 2 years ago

Hi @mullerhai , Is your goal to iterate through that dataset? If so, you need to create an iterator (e.g. by calling tf.data.makeIterator). Also in your example here, the DatasetToTfRecord is writing to the same file as the dataset you've loaded so I'm not sure what is the expected behavior here, you should try writing to a different file. If you don't mind adding org.tensorflow:tensorflow-framework to your dependencies, we do have utilities to simplify the usage of dataset, take a look at this one. You can then iterate through the element of the dataset in eager mode like this :

    Dataset dataset = Dataset.tfRecordDataset(tf, "yourfile.tfrecord", "", 0L).batchSize(10);
    for (List<Operand<?>> components : dataset) {
         Operand<?> featureBatch = components.get(0);
         Operand<?> labelBatch = components.get(1);

         ... operate on the batches directly
    }

Eager mode tends to be slow though so if you can provide more details of what is your specific use cases, maybe we can give you better examples on how to do it.

Hello, is there any way in which you could run this code outside eager mode? I need to access the binary representation of the example to hit a ParseExample node within a graph.

thanks!

No ,I have not make it real

karllessard commented 1 year ago

Hello, is there any way in which you could run this code outside eager mode? I need to access the binary representation of the example to hit a ParseExample node within a graph.

thanks!

Sure, that will work in Graph mode as well, you just need to make sure that the tf instance you are passing to Dataset.tfRecordDataset is executing in a graph environment i.e. var tf = Ops.create(graph);

You won't be able to use a Java for loop though so you'll need to rely on other TF ops and methods exposed by the datasets/iterators to iterate through the examples within your graph.