tensorflow / java

Java bindings for TensorFlow
Apache License 2.0
829 stars 201 forks source link

TF Python producing malformed JSON for numpy style arrays. #444

Open JimClarke5 opened 2 years ago

JimClarke5 commented 2 years ago

TF Python serializes numpy style arrays (e.g. Java NdArray) in the format 'normalizer': array([[0.2, 0.2], [0.1, 0.3]], dtype=float32)'. This is not standard JSON, and tools, like GSON, throw a MalformedJsonException when trying to parse this.

The issue is how to best handle this in TF Java.

I have looked at GSON using customized TypeAdapter and Serilaizers/Deserializers, but I cannot get past the low level parsing throwing the MalformedJsonException. Right now I think my only alternative is to write a pre-parser to convert the array() format string to a wellformed JSON string. For example, for the above example:

         {
             'normalizer': {
                " '_ArraySize_' = [1,2],    
                " '_ArrayData_' =  [[0.2, 0.2], [0.1, 0.3]], 
                " '_ArrayType_' = 'float32' 
              }
          }

_ArraySize_ is the array's shape and _ArrayType_ is the datatype for the NdArray. This format is taken from OpenJData

Also, once we settle on a way to do this, should the NdArray package be modified to include serialize/deserialize methods. What about compatibility with TF Python?

Any suggestions?

Craigacp commented 2 years ago

I think DL4J also has some parsing logic for the JSON emitted by Keras, maybe we could look there to see what hacks they did? I'm surprised TF Python is emitting malformed JSON, is there any comment in the code about it or open issue on their repo?

karllessard commented 2 years ago

They do seem to do some custom stuff yes, see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/saving/saved_model/json_utils.py

JimClarke5 commented 2 years ago

Yes, I have seen this and it is pretty straight forward to implement these customizations in GSON. However, what seems to be stored in the SavedModel is not pure JSON, but a string version of a Python dict. It is close to JSON, but not 100%.

The following Python code demonstrates this:

a = np.array([[0.2, 0.2], [0.1, 0.3]], dtype="float32")
m = { "array": a}
print(m)

-->

{'array': array([[0.2, 0.2],
       [0.1, 0.3]], dtype=float32)}

Is there a way to translate a Java Map to a Python dict in both directions, considering the ndarray issue above?