tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.44k stars 1.92k forks source link

Error dumping weights duplicate weight name kernel #8233

Closed VineetKumar02 closed 6 months ago

VineetKumar02 commented 6 months ago
from keras.models import Sequential
from keras.layers import LSTM, Dense
import tensorflowjs as tfjs

# Create a dictionary to store models for each bin
bin_models = {}

# Loop through each bin
for bin_name in field_names:
    # Create a new model for each bin with unique layer names
    model = Sequential([
        LSTM(50, return_sequences=True, input_shape=(time_steps, n_cols), name='lstm_1'),
        LSTM(64, return_sequences=False, name='lstm_2'),
        Dense(32, name='dense_1'),
        Dense(16, name='dense_2'),
        Dense(n_cols, name='dense_output')
    ])

    # Compile the model
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])

    # Train the model for the current bin
    mae, rmse, mse = lstm_for_bin(model, bin_name)
    print(f"Metrics for {bin_name}: MAE={mae}, RMSE={rmse}, MSE={mse}")

    # Save the model for each bin
    model_name = f"{bin_name}_lstm_model.h5"
    model.save(model_name)
    print(f"Model for {bin_name} saved as {model_name}")

    # Store the model in the dictionary
    bin_models[bin_name] = model

    # Convert the saved model to TensorFlow.js format
    tfjs.converters.save_keras_model(model, f'tfjs_models/{bin_name}')

I tried both ways to convert my keras model to tensorflowjs model. The error looks like this: image image

ANY IDEA HOW TO RESOLVE THIS ISSUE!!?

For questions on how to work with TensorFlow.js, or support for problems that are not verified bugs in TensorFlow.js, please go to StackOverflow.

gaikwadrahul8 commented 6 months ago

Hi, @VineetKumar02

I apologize for the delayed response and to confirm, May I know which versions of Python, Keras and Tensorflow.js are you using ? I was trying to replicate the same issue from my end but it seems like given code snippet is incomplete to reproduce the same behaviour from my end if possible could you please help me with your Github repo/ complete code snippet to reproduce the same behavior from my end to investigate this issue further ?

Meanwhile, I see you're trying to convert keras model which you're saving in .h5 model format and it should work but looking at your error message there may be duplicate layer names ensure unique layer names across all models if you're using more than one model converting, even those within different model files. use descriptive names like lstm_1_bin_1, lstm_1_bin_2 and check print(model.summary()) to view layer names for debugging.

Thank you for your cooperation and patience.

VineetKumar02 commented 6 months ago

Version I used are: Python version: 3.10.12 Keras version: 3.1.1 TensorFlow version: 2.16.1 TensorFlowJS version: 4.17.0

from keras.models import Sequential
from keras.layers import LSTM, Dense
import tensorflowjs as tfjs

# Create a dictionary to store models for each bin
bin_models = {}

# Loop through each bin
for bin_name in field_names:
    # Create a new model for each bin with unique layer names
    model = Sequential([
        LSTM(50, return_sequences=True, input_shape=(time_steps, n_cols), name=f'lstm_1_{bin_name}'),
        LSTM(64, return_sequences=False, name=f'lstm_2_{bin_name}'),
        Dense(32, name=f'dense_1_{bin_name}'),
        Dense(16, name=f'dense_2_{bin_name}'),
        Dense(n_cols, name=f'dense_output_{bin_name}')
    ])

    print(model.summary())

    # Compile the model
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])

    # Train the model for the current bin
    mae, rmse, mse = lstm_for_bin(model, bin_name)
    print(f"Metrics for {bin_name}: MAE={mae}, RMSE={rmse}, MSE={mse}")

    # Save the model for each bin
    model_name = f"{bin_name}_lstm_model.h5"
    model.save(model_name)
    print(f"Model for {bin_name} saved as {model_name}")

    # Store the model in the dictionary
    bin_models[bin_name] = model

    # Convert the saved model to TensorFlow.js format
    tfjs.converters.save_keras_model(model, f'tfjs_models/{bin_name}')

I also tried the method you suggested like having totally unique layer names by also adding the bin name to the layer names. But still i face the same issue. See the images for better understanding: image

Here is the link to the github repo which has the code along with the error: https://github.com/VineetKumar02/sample_lstm_code/blob/953b7c054cd59c4b321b5b9dc2c903b8b93114a9/fyp_lstm_all_download.ipynb

Kindly help me out with this issue please. I'm running these code in the google colab notebook. I have also attached how my .h5 model files looks like. Maybe it will be usful to find out the issue: image

gaikwadrahul8 commented 6 months ago

Hi, @VineetKumar02

I apologize for the delay in my response. I'm attempting to replicate the behavior you achieved in your Google Colab notebook. However, after running the preprocessing section code, I'm encountering NaN values in the pandas DataFrame. This is resulting in a ValueError: Input contains NaN error message.

For your reference, you can find the relevant Google Colab code snippet here: gist-file.

I've carefully reviewed the code and ensured I'm using the same version as yours. However, I'm still unable to replicate the expected behavior. Could there be any additional steps or considerations I might be missing?

Thank you for your cooperation and patience.

VineetKumar02 commented 6 months ago

I'm not sure on why you are facing that issue. But i was able to find a temperory solution for this problem. When I install the tensorflowjs package in colab, it's also installing tensorflow 2.16.1 which is causing the issue. So the workround that i found was to install the tensorflowjs first and thn force install tensorflow 2.15.0, then I was able to convert without any issue. You can see the gist file below: https://github.com/VineetKumar02/sample_lstm_code/blob/fa43dffee2408215d5ec2312c45aa98820a57b4a/fyp_lstm_all_download.ipynb

The model saved using tensorflow 2.15.0 looks like this: image

So i think the issue is with the latest version of tensorflow, you can refer how the model looks when it is saved using the latest version in my previous comment

gaikwadrahul8 commented 6 months ago

Hi, @VineetKumar02

My bad there was some issue with Smart bins Argyle Square dataset which I downloaded but now I have downloaded correct dataset and I'm able to run your Google colab notebook without any error with Tensorflow.js version 4.17.0 and Tensorflow 2.15.0 and yes you're absolutely correct after downloading the latest Tensorflow.js version it's installing the Tensorflow version 2.16.1and it seems like tfjs- converter is not compatible with Tensorflow version 2.16.1 at the moment, thank you for bringing this issue to our attention and our team will work on this issue to make tfjs-converter compatible with Tensorflow version 2.16.1 soon

I'm able to replicate the previous issue also with latest version of Tensorflow 2.16.1 and Tensorflow.js 4.17.0, for reference here is Google colab notebook

If you are no longer experiencing problems after downgrading TensorFlow version to 2.15.0 after installing the Tensorflow.js , please feel free to close this issue

Thank you for your cooperation and patience.

google-ml-butler[bot] commented 6 months ago

Are you satisfied with the resolution of your issue? Yes No

turbobuilt commented 6 months ago

I can't downgrade because I get another error

WARNING:root:TensorFlow Decision Forests 1.9.0 is compatible with the following TensorFlow Versions: ['2.16.1']. However, TensorFlow 2.15.0 was detected. This can cause issues with the TF API and symbols in the custom C++ ops. See the TF and TF-DF compatibility table at https://github.com/tensorflow/decision-forests/blob/main/documentation/known_issues.md#compatibility-table.

WARNING:root:Failure to load the inference.so custom c++ tensorflow ops. This error is likely caused the version of TensorFlow and TensorFlow Decision Forests are not compatible. Full error:/usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/tensorflow/ops/inference/inference.so: undefined symbol: _ZN10tensorflow20OpKernelConstruction21CtxFailureWithWarningEPKciRKN4absl12lts_202308026StatusE

File "/usr/local/lib/python3.10/dist-packages/tensorflowjs/init.py", line 21, in from tensorflowjs import converters File "/usr/local/lib/python3.10/dist-packages/tensorflowjs/converters/init.py", line 21, in from tensorflowjs.converters.converter import convert File "/usr/local/lib/python3.10/dist-packages/tensorflowjs/converters/converter.py", line 37, in from tensorflowjs.converters import tf_saved_model_conversion_v2 File "/usr/local/lib/python3.10/dist-packages/tensorflowjs/converters/tf_saved_model_conversion_v2.py", line 28, in import tensorflow_decision_forests File "/usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/init.py", line 64, in from tensorflow_decision_forests import keras File "/usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/keras/init.py", line 53, in from tensorflow_decision_forests.keras import core File "/usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/keras/core.py", line 64, in from tensorflow_decision_forests.keras import core_inference File "/usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/keras/core_inference.py", line 38, in from tensorflow_decision_forests.tensorflow.ops.inference import api as tf_op File "/usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/tensorflow/ops/inference/api.py", line 179, in from tensorflow_decision_forests.tensorflow.ops.inference import op File "/usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/tensorflow/ops/inference/op.py", line 15, in from tensorflow_decision_forests.tensorflow.ops.inference.op_dynamic import * File "/usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/tensorflow/ops/inference/op_dynamic.py", line 24, in raise e File "/usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/tensorflow/ops/inference/op_dynamic.py", line 21, in ops = tf.load_op_library(resource_loader.get_path_to_datafile("inference.so")) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/load_library.py", line 54, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename) tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.10/dist-packages/tensorflow_decision_forests/tensorflow/ops/inference/inference.so: undefined symbol: _ZN10tensorflow20OpKernelConstruction21CtxFailureWithWarningEPKciRKN4absl12lts_202308026StatusE

VineetKumar02 commented 6 months ago

@turbobuilt . Are you using google colab or running on your local system?

myekini commented 6 months ago

i'm having this error too.

File "/Users/muhammadyekini/Downloads/Muhammad_Yekini/CNN Image detection/venv/lib/python3.11/site-packages/tensorflowjs/write_weights.py", line 118, in write_weights _assert_no_duplicate_weight_names(weight_groups) File "/Users/muhammadyekini/Downloads/Muhammad_Yekini/CNN Image detection/venv/lib/python3.11/site-packages/tensorflowjs/write_weights.py", line 346, in _assert_no_duplicate_weight_names raise Exception( Exception: Error dumping weights, duplicate weight name gamma