How to save NdArray object to local disk?

mullerhai commented 2 years ago

Hi: when I generate NdArray obj from spark dataframe ,I want to save the NdArray obj for next model training，but I don't know how to save it.

mullerhai commented 2 years ago

java.io.NotSerializableException: org.tensorflow.ndarray.impl.dense.DoubleDenseNdArray
  at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185)
  at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1553)
  at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1510)
  at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
  at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
  at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349)

karllessard commented 2 years ago

Unfortunately there is no endpoint out-of-the-box for serializing/deserializing an NdArray. It would be a great addition though. You probably want to save the array type, shape and data instead of an object?

Can you show me how you initialize your NdArray? That might give me some cue on how we can tackle this.

mullerhai commented 2 years ago

Unfortunately there is no endpoint out-of-the-box for serializing/deserializing an NdArray. It would be a great addition though. You probably want to save the array type, shape and data instead of an object?

Can you show me how you initialize your NdArray? That might give me some cue on how we can tackle this.

thanks, let me see. Now we have three ways to generate NdArray

from java& scala normal array

val testMatrix1 = StdArrays.ndCopyOf(Array[Array[Int]](Array(1, 2, 3, 4, 45), Array(2, 4, 6, 8, 10), Array(3, 6, 9, 12, 15), Array(4, 8, 12, 16, 20)))
    val testMatrix2 = StdArrays.ndCopyOf(Array[Array[Int]](Array(1), Array(0), Array(1), Array(1)))

from data disk file like csv or zip file or inputStream , nio Databuffers


@throws[IOException]
private def readArchive(archiveName: String) = {
//    val dataset =classOf[MnistDataset].getClassLoader.getResourceAsStream(archiveName) //NullPointerException
val dataset = new java.io.FileInputStream(archiveName)

val gzipInputStream = new GZIPInputStream(dataset)
val archiveStream = new DataInputStream(gzipInputStream) //new GZIPInputStream(new java.io.FileInputStream("src/main/resources/"+archiveName))
//      )
archiveStream.readShort // first two bytes are always 0

val magic = archiveStream.readByte
if (magic != TYPE_UBYTE) throw new IllegalArgumentException("\"" + archiveName + "\" is not a valid archive")
val numDims = archiveStream.readByte
val dimSizes = new Array[Long](numDims)
var size = 1 // for simplicity, we assume that total size does not exceeds Integer.MAX_VALUE
for (i <- 0 until dimSizes.length) {
  dimSizes(i) = archiveStream.readInt
  size = size * dimSizes(i).toInt
}
println(s"size  ${size}")
val bytes = new Array[Byte](size)
archiveStream.readFully(bytes)
NdArrays.wrap(Shape.of(dimSizes: _*), DataBuffers.of(bytes, true, false))
}


3.from spark dataframe vectorUDT to generate 

  here  need to import three package,some complex convert to generate,  I  think  maybe could from ndarray return regenerate spark dataframe  to write like csv or parquet file 

4.also from Operand[T] tensor  or  tfrecord Dataset
   here  I only see  from   ndArray to generate tensor ,and from  Dataset to generate ndArray
...

tensorflow / java

How to save NdArray object to local disk? #463