tensorflow / java

Java bindings for TensorFlow
Apache License 2.0
How to save NdArray object to local disk? #463

mullerhai opened 2 years ago

mullerhai commented 2 years ago

Hi: when I generate NdArray obj from spark dataframe ,I want to save the NdArray obj for next model training,but I don't know how to save it.

mullerhai commented 2 years ago
java.io.NotSerializableException: org.tensorflow.ndarray.impl.dense.DoubleDenseNdArray
  at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185)
  at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553)
  at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510)
  at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
  at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
  at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349)
karllessard commented 2 years ago

Unfortunately there is no endpoint out-of-the-box for serializing/deserializing an NdArray. It would be a great addition though. You probably want to save the array type, shape and data instead of an object?

Can you show me how you initialize your NdArray? That might give me some cue on how we can tackle this.

mullerhai commented 2 years ago

thanks, let me see. Now we have three ways to generate NdArray

  1. from java& scala normal array
val testMatrix1 = StdArrays.ndCopyOf(Array[Array[Int]](Array(1, 2, 3, 4, 45), Array(2, 4, 6, 8, 10), Array(3, 6, 9, 12, 15), Array(4, 8, 12, 16, 20)))
    val testMatrix2 = StdArrays.ndCopyOf(Array[Array[Int]](Array(1), Array(0), Array(1), Array(1)))
  1. from data disk file like csv or zip file or inputStream , nio Databuffers

    private def readArchive(archiveName: String) = {
    //    val dataset =classOf[MnistDataset].getClassLoader.getResourceAsStream(archiveName) //NullPointerException
    val dataset = new java.io.FileInputStream(archiveName)
    val gzipInputStream = new GZIPInputStream(dataset)
    val archiveStream = new DataInputStream(gzipInputStream) //new GZIPInputStream(new java.io.FileInputStream("src/main/resources/"+archiveName))
    //      )
    archiveStream.readShort // first two bytes are always 0
    val magic = archiveStream.readByte
    if (magic != TYPE_UBYTE) throw new IllegalArgumentException("\"" + archiveName + "\" is not a valid archive")
    val numDims = archiveStream.readByte
    val dimSizes = new Array[Long](numDims)
    var size = 1 // for simplicity, we assume that total size does not exceeds Integer.MAX_VALUE
    for (i <- 0 until dimSizes.length) {
      dimSizes(i) = archiveStream.readInt
      size = size * dimSizes(i).toInt
    println(s"size  ${size}")
    val bytes = new Array[Byte](size)
    NdArrays.wrap(Shape.of(dimSizes: _*), DataBuffers.of(bytes, true, false))

3.from spark dataframe vectorUDT to generate 

  here  need to import three package,some complex convert to generate,  I  think  maybe could from ndarray return regenerate spark dataframe  to write like csv or parquet file 

4.also from Operand[T] tensor  or  tfrecord Dataset
   here  I only see  from   ndArray to generate tensor ,and from  Dataset to generate ndArray