keras-team / tf-keras

The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Apache License 2.0
63 stars 28 forks source link

Exporting Weights from Tensorflow Keras to C produces a different set of outputs, reduced accuracy in the model. #333

Closed BeverlyCode closed 1 year ago

BeverlyCode commented 1 year ago

For some reason when I export my weights from a Tensorflow Keras model, be it a simple Sequential FNN, and then load them into C to perform forward passes I get different output results, sometimes similar and often just not the same at all. It's like the accuracy has taken a substantial loss over the original predict() in Keras.

This is how I export my weights:

    for layer in model.layers:
        if layer.get_weights() != []:
            numpy.savetxt(layer.name + ".csv", layer.get_weights()[0].flatten(), delimiter=",")
            numpy.savetxt(layer.name + "_bias.csv", layer.get_weights()[1].flatten(), delimiter=",")

and

f.write(", /* weight */ " + str(layer.get_weights()[0].flatten()[i]))
f.write(", /* bias */ " + str(layer.get_weights()[1].flatten()[i]))

The first method writes a long E value and the second writes a smaller float value to the most significant decimal place. Although that should not cause a loss in precision over the first method? right?

I'm not sure why the saved weights would give different outputs, even with a 1 layer FNN that has 1 output, a very simple network, all the weights are saved to float32 in C and then processed with float math functions such as tanhf() etc depending on the layer output.

I must be missing some understanding of what Tensorflow or Keras is doing behind the scenes? Something it's default configuration is doing to my network/weights that I need to disable or reproduce? Are the arrays of weights in reverse or something? Do I need to iterate them from left to right and not right to left on the inputs? What am I missing?

Here is a project that demonstrates this issue: https://github.com/jcwml/neural_unitvector

In this project a dataset is exported from main.c to ensure the C program and Tensorflow Keras program both use data generated by the same random function, then Keras trains a set of weights for a simple FNN network to some number of hidden units and layers, and then those weights are exported and loaded into the C program where the forward pass is reconstructed in C code. There are 3 types of networks in this example that have had their forward passes reconstructed, each more complicated than the last, and yet, none of them produce the same outputs as Keras predict() had, it's not even a minor accuracy loss, it is substantially different, you can tell that the weights still represent what they where trained to but with just much worse prediction accuracy.

Another important fact to note is that when I train a network with multiple layers it would seem in many cases when I export the weights to C for a forward pass that the input data passed into the network eventually becomes 0's before it gets to the deeper layers, like vanishing gradients, this would imply that Tensorflow is using a different precision float format but the documentation says it uses float32 by default and the console printouts also claim to be using float32 in my application. My only thought is that maybe numpy is not saving them in the full float32 precision but when I looked into that apparently it does export full precision, and that exporting weights this way should be fine.

I am very confused.

tilakrayal commented 1 year ago

@BeverlyCode, I was facing the issue/error while executing the given code. Kindly find the gist of it here.

BeverlyCode commented 1 year ago

@BeverlyCode, I was facing the issue/error while executing the given code. Kindly find the gist of it here.

You will need to generate the dataset first by going into the main.c file here at line#236 and uncommenting down to line#276, then add a return 0; and compile with gcc main.c -lm -Ofast -mfma -march=native -o main then executing with ./main, it will generate the required datasets dataset.dat and testset.dat. (remember to comment those lines back out after generating the datasets as you will need to re-run this program for predictions later and you don't want to over-write your existing dataset with a new one) then just execute python3 fit.py (no parameters)

A C header file with the weights will be output to the models/ directory. These are the weights used to execute the forward-pass in C in main.c.

That is a simple FNN neural network with 16 units, 2 layers: input and output layers constructed in C as so:

float nx3, ny3, nz3; // output variables of norm_neural() as globals
// ^ (normalised x,y,z components of the un-normalised input vector to the network)
void norm_neural(float x, float y, float z)
{
    float h[16];
    for(int i = 0; i < 16; i++)
    {
        const int j = i*4;
        h[i] = tanhf((nv0[j] * x) + (nv0[j+1] * y) + (nv0[j+2] * z) + nv0[j+3]);
    }
    float o[3] = {0};
    for(int i = 0; i < 3; i++)
    {
        o[i] = 0.f;
        const int j = i*17;
        for(int k = 0; k < 16; k++)
            o[i] += (nv1[j+k] * h[k]);
        o[i] += nv1[j+16];
    }

    nx3 = o[0];
    ny3 = o[1];
    nz3 = o[2];
}

Your output header file in the models/ directory will contain two const float arrays; neural_unitvector_layer0 and neural_unitvector_layer1.

Rename neural_unitvector_layer0 to nv0. replace line#68 with the new const float array nv0.

Rename neural_unitvector_layer1 to nv1. replace line#69 with the new const float array nv1.

(remember to comment out those lines at the beginning of main.c that generate new dataset.dat and testset.dat files) Compile with gcc main.c -lm -Ofast -mfma -march=native -o main then executing with ./main again.

You should get the same predictions as the Python program because it's the same dataset.dat and testset.dat, on the same weights. But you won't for some reason. It produces different results to what model.predict() did on the same weights. The main.c only tests the overall accuracy but you can easily change it to do direct comparisons of the testset.dat in its entirety or just pull a single input from it for comparison - what is obvious is that the same weights produce different results in C and Tensorflow Keras in Python and I don't know why.

BeverlyCode commented 1 year ago

Just a small update, from my research exporting the weights using both numpy.savetxt(layer.name + ".csv", layer.get_weights()[0].flatten(), delimiter=",") and f.write("," + str(layer.get_weights()[0].flatten()[bc])) on the latest Python versions and packages exports the weights to the exact decimal precision they are in.

But encase they didn't I also exported all the weight floats as bytes and then converted them back to floats in C to be extra-sure using this function on the export: def float_to_hex(f): return hex(struct.unpack('<I', struct.pack('<f', f))[0]) (which gave the same weights)

It would seem that in even very basic FNN's in Tensorflow, there is something that Tensorflow is doing in the forward-pass that is not obvious to me (and seemingly between each layer if I had to guess), and I'm not sure how I would find-out what it is doing. I would assume that in Tensorflow the state of all things are disabled by default, such as layer regularisation, etc, unless explicitly turned on? Is it possible that Tensorflow actually turns some of this stuff on by default to improve default trained networks and that I just haven't realised this?

I'm clutching at straws here but Tensorflow is a large complicated SOTA framework and I'm not really sure where to start or how to go about debugging this.

But it would be very valuable to me to be able to take the weights of very simple FNN networks trained in Tensorflow Keras and then re-use them in a C program that re-creates the forward-pass.

When I export my model from a similar or equivalent FNN network made in Tensorflow Keras that I am also trying to export weights from in the exact same manner described above I get this output:

{
 "class_name": "Sequential",
 "config": {
  "name": "sequential",
  "layers": [
   {
    "class_name": "InputLayer",
    "config": {
     "batch_input_shape": [
      null,
      784
     ],
     "dtype": "float32",
     "sparse": false,
     "ragged": false,
     "name": "dense_input"
    }
   },
   {
    "class_name": "Dense",
    "config": {
     "name": "dense",
     "trainable": true,
     "dtype": "float32",
     "batch_input_shape": [
      null,
      784
     ],
     "units": 8,
     "activation": "tanh",
     "use_bias": true,
     "kernel_initializer": {
      "class_name": "GlorotUniform",
      "config": {
       "seed": null
      }
     },
     "bias_initializer": {
      "class_name": "Zeros",
      "config": {}
     },
     "kernel_regularizer": null,
     "bias_regularizer": null,
     "activity_regularizer": null,
     "kernel_constraint": null,
     "bias_constraint": null
    }
   },
   {
    "class_name": "Dense",
    "config": {
     "name": "dense_1",
     "trainable": true,
     "dtype": "float32",
     "units": 1,
     "activation": "sigmoid",
     "use_bias": true,
     "kernel_initializer": {
      "class_name": "GlorotUniform",
      "config": {
       "seed": null
      }
     },
     "bias_initializer": {
      "class_name": "Zeros",
      "config": {}
     },
     "kernel_regularizer": null,
     "bias_regularizer": null,
     "activity_regularizer": null,
     "kernel_constraint": null,
     "bias_constraint": null
    }
   }
  ]
 },
 "keras_version": "2.11.0",
 "backend": "tensorflow"
}

And this is odd because this output does seem to claim that it is just a regular basic FNN network, which my C code to reproduce the forward-pass of above in my prior responses should work just fine? It is odd.

BeverlyCode commented 1 year ago

Trying to figure this out independently still, I downloaded the Keras source code to try and find discrepancies, pdb was useless for this task for example, stepping though execution for a discrepancy is a lot less efficient when I can just check the code.

So I'm still in this process, Keras is a large system, it may take me a while. I am also going to look at tensorflow-onnx to see how they convert Keras models to ONNX for inference.

A comparison of the tanh function activations.py#L379 shows very little difference between that and other implementations in terms of accuracy, a small C program enclosed demonstrates this:

#include <stdio.h>
#include <math.h>

const float e = 2.718281746F;
const float e2 = 7.389056206F;

static inline float mysinh(float f)
{
    return 0.5f * (powf(e, f) - powf(e, -f));
}

static inline float mycosh(float f)
{
    return (powf(e, f)/2.f) + (powf(e, -f)/2.f);
}

static inline float mytanh1(float f)
{
    return mysinh(f) / mycosh(f);
}

static inline float mytanh2(float f)
{
    const float pe = powf(e2, f);
    return (pe-1.f) / (pe+1.f);
}

static inline float kerasth(const float x) // keras tanh
{
    return ((expf(x) - expf(-x))/(expf(x) + expf(-x)));
}

int main()
{
    printf("mycosh():  %g\ncoshf():   %g\n", mycosh(45.f), coshf(45.f));
    printf("mytanh1(): %g\nmytanh2(): %g\ntanhf():   %g\ntanh():    %g\nkerasth(): %g\n", mytanh1(0.2f), mytanh2(0.2f), tanhf(0.2f), tanh(0.2f), kerasth(0.2f));
    printf("---\n");
    printf("mycosh():  %f\ncoshf():   %f\n", mycosh(45.f), coshf(45.f));
    printf("mytanh1(): %.9f\nmytanh2(): %.9f\ntanhf():   %.9f\ntanh():    %.9f\nkerasth(): %.9f\n", mytanh1(0.2f), mytanh2(0.2f), tanhf(0.2f), tanh(0.2f), kerasth(0.2f));
    return 0;
}

outputs:

mycosh():  1.74671e+19
coshf():   1.74671e+19
mytanh1(): 0.197375
mytanh2(): 0.197375
tanhf():   0.197375
tanh():    0.197375
kerasth(): 0.197375
---
mycosh():  17467112198709968896.000000
coshf():   17467135288454152192.000000
mytanh1(): 0.197375312
mytanh2(): 0.197375342
tanhf():   0.197375327
tanh():    0.197375323
kerasth(): 0.197375312

It feels like either Keras is scaling the inputs or using some kind of regularisation between layers which is not reported in the model or explicitly set by the user. Something is odd, but I am having a hard time working out what.

So at this point I am going to assume it's not a discrepancy in how the activation functions are computed, and that's what makes all of this so odd, we're talking about the forward pass for a very simple neural network of two Dense layers, an input and an output. This should not be this hard to replicate outside of Keras with the same weights?

A temporary solution Keras2c

A little googling and I discovered this project called "keras2c": Described here: https://control.princeton.edu/wp-content/uploads/sites/418/2021/04/conlin2021keras2c.pdf Keras2c GitHub: https://github.com/f0uriest/keras2c

I've tested it out and it's really easy to use, it's a great project, it doesn't give me the EXACT same outputs as say using ONNX or Keras but the outputs are basically the same, with what seems to be mostly negligible precision loss. The performance however seems to be that Keras2c runs at about or just less than half of the performance of ONNX but it's actually still really good in the grand scheme of things, the performance may not be as good at ONNX but it's still more powerful and can process more inputs to outputs a second that I will probably ever need for most of the model sizes I work with.

It also worth mentioning that ONNX has a C runtime here and xboot/libonnx here that I am yet to look into, but I will, and in this issue I will update my findings. (I've been using ONNX via python so far)

Keras2c is still more complicated than it needs to be to implement a simple Dense FNN though, it's incredibly simple, and you could strip it apart and make a more condensed version for your specific purpose (I will be attempting this to try and understand why my original code snippet for a forward pass was not working as intended) but really my code in the original issue should work, it's a simple Dense network for Pete's sake lol.

Obviously I could have also used the Tensorflow C api, but that defeats the point of a simple implementation in C from exported weights, and the C shared library files for libtensorflow amount to some 1.1 GB, that's a pretty hefty ordeal for a simple network, I suppose you also have Tensorflow Lite which I suppose would have a smaller footprint but I also have to investigate that route independently.

Tensorflow Lite: https://github.com/tensorflow/tflite-micro https://github.com/tensorflow/tflite-support

Looking at libonnx

https://github.com/xboot/libonnx

While I don't believe libonnx is the official onnx library, so keep that in mind, as the official onnx python api was the most performant of all however libonnx was less performant than Keras2c. libonnx only managed to achieve around 150 predictions per second on a relatively small 3.2mb CNN (the same CNN that Keras2c used). Also libonnx albeit still relatively easy to setup and use; it required in the examples that names of the input and output layers are used to identify them to set and get tensors, while this may be fine for some people, if like me you used tf2onnx to convert your Keras model to ONNX like I did, then you may need to use a software like Netron to load your ONNX model and findout what the input and output layer names are, unless you enjoy viewing binary files in hex editors.

All in all, libonnx is easy to use, but not as performant and as easy as Keras2c.

Summary

Obviously these are not answers to my original issue, why is a standard FNN implementation in Keras using the most basic and simple Dense layers not working with a simple forward pass in C on the exported weights from the Keras model; which should by all means 'just work'. I have no doubt that either I will have answered this question myself in this issue in the coming week or someone else more knowledgeable will have come to my aid. Either way I hope that the documentation of my journey here provides a wealth of information, trial and error, for other people starting out and experiencing the same problems.

hertschuh commented 1 year ago

Hi @BeverlyCode ,

It is really difficult to assess whether the code in main.c is equivalent to the Keras/Tensorflow code. Just the matrix multiplication, for instance, is difficult to analyze.

I do see a possible culprit though. In the example output of the exported keras model you added above, I see "activation": "sigmoid" on the last Dense layer. However, your C code only implements tanh. And in fit.py, the last layer has no activation right now. So there is a discrepancy here. Can you verify if that is the issue?

BeverlyCode commented 1 year ago

Hi @BeverlyCode ,

It is really difficult to assess whether the code in main.c is equivalent to the Keras/Tensorflow code. Just the matrix multiplication, for instance, is difficult to analyze.

I do see a possible culprit though. In the example output of the exported keras model you added above, I see "activation": "sigmoid" on the last Dense layer. However, your C code only implements tanh. And in fit.py, the last layer has no activation right now. So there is a discrepancy here. Can you verify if that is the issue?

Nah that's just a model I dumped from a different network which I also output to C, some I use sigmoid some I use linear outputs. I have basically replicated many Dense FNN networks in C from Keras, and they all 'kind of work' but not really because the accuracy is lacking for whatever is causing this issue.

Nice try though, if only it was that simple. ;)

For the sake of morbid curiosity though here was the C forward pass for that network

static inline float sigmoid(const float x)
{
    return 1.f / (1.f + expf(-x));
}

float predict(float* input)
{
    float o1[8] = {0.f,0.f,0.f,0.f,0.f,0.f,0.f,0.f};
    int j = 0, ji = 0;
    for(int i = 0; i < 6280; i++)
    {        
        if(ji > 784)
        {
            o1[j] += model_fnn_layer0[i]; // bias
            o1[j] = tanhf(o1[j]);
            j++;
            ji = 0;
        }
        else
        {
            o1[j] += input[ji] * model_fnn_layer0[i]; // weight
            ji++;
        }
    }

    float o2 = 0.f;
    for(int i = 0; i < 8; i++)
        o2 += o1[i] * model_fnn_layer1[i]; // weight
    o2 += model_fnn_layer1[8]; // bias

    return sigmoid(o2);
}
hertschuh commented 1 year ago

@BeverlyCode ,

I changed line 208 to this:

for weight in layer.get_weights()[0].transpose().flatten():

and tested in a roundabout way and got the same results.

So it's either a bug in the norm_neural code or the Python export code. It could just come from a misunderstanding of how flatten works.

Anyway, transposing before exporting fixes it.

Thanks.

BeverlyCode commented 1 year ago

You are completely right, you have solved it! Thank you so much!

I had to transpose the weight matrix before export, out of interest how did you discover this, I know that Tensorflow uses channels_last but the row/column major of the matrix was not a factor I had considered, partly I think because I did not even think of the weights as a matrix with columns or rows rather as a flat array.

I am wondering what your thought process was to have solved this so that I can improve my though processes.

But thank you so much. I hope that other people as confused as me find this thread and your answer. I should in future dump the array shape in future and if it is a matrix consider transposing I suppose. So Tensorflow uses Column Major matrix, a reason I don't know why, but also, I'm not sure how I would have known this unless working off a hunch, having noticed it was a matrix and considered it's possible it was in the wrong major for what I was expecting.

hertschuh commented 1 year ago

Hi @BeverlyCode ,

My thought process was mostly a hunch that:

Glancing at the C code, it looked correct. But one thing wasn't clear to me, which was whether the loops and index calculations were done correctly in the matrix multiplication to multiply rows with columns. Rather than trying to prove the code right or wrong, I thought I would transpose and give it a shot. And transposing the export was easier than modifying the code.

As far as the order of columns vs. rows is concerned, in this particular case, it's Numpy that's used (get_weights returns Numpy arrays and flatten() is a Numpy API, not Tensorflow). But as it turns out, Tensorflow uses the same order.

import numpy as np
y = np.array([[1, 2], [3, 4]])
y.flatten()
array([1, 2, 3, 4])
import tensorflow as tf
x = tf.constant([[1, 2], [3, 4]])
tf.reshape(x, (4,))
<tf.Tensor: shape=(4,), dtype=int32, numpy=array([1, 2, 3, 4], dtype=int32)>
google-ml-butler[bot] commented 1 year ago

Are you satisfied with the resolution of your issue? Yes No