information from training stored in layers not in model

emlys commented 4 years ago

System information

Have I written custom code (as opposed to using example directory): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.14.4
TensorFlow backend (yes / no): Yes
TensorFlow version: 2.1.0
Keras version: 2.2.4-tf
Python version: 3.7.4

Describe the current behavior
I am doing k-fold cross validation of my model. To do this I need to start each fold with a new, un-trained model object that has not carried over any information from the past fold. I want to reuse the code on multiple different architectures, so I am passing a list of keras layers into a create_model method, which instantiates a keras.Sequential model and compiles it.

Despite apparently creating a new model at a new address upon each call of create_model, information is being carried over from one fold to the next. The accuracy at the start of one fold is about where it was at the end of the last fold. If I instantiate the layers within the create_model method, the accuracy restarts at the beginning of each fold.

Describe the expected behavior
Different instances of a model object should not share information. It's confusing, and it would be nice to be able to pass in a list of layers to a create_model method.

Code to reproduce the issue

from tensorflow import keras
from sklearn.model_selection import KFold
import numpy as np

architecture = [
    keras.layers.Dense(128, input_dim=55, activation='sigmoid'),
    keras.layers.Dense(64, activation='sigmoid'),
    keras.layers.Dense(7, activation='softmax')
]

def create_model_doesnt_work(architecture):
    model = keras.Sequential(architecture)

    model.compile(
        optimizer=keras.optimizers.SGD(),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    return model

def create_model_works():
    model = keras.Sequential([
        keras.layers.Dense(128, input_dim=55, activation='sigmoid'),
        keras.layers.Dense(64, activation='sigmoid'),
        keras.layers.Dense(7, activation='softmax')
    ])

    model.compile(
        optimizer=keras.optimizers.SGD(),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    return model

print("Bad: accuracy trends upward from fold to fold")

# Fit the model k times using k-fold cross validation
for train_index, test_index in KFold(5, shuffle=True).split(x):
    # Get the train and test sets for this fold
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Model needs to be re-compiled for each fold so that we are starting over fresh
    model = create_model_doesnt_work(architecture)

    # Confirm the model is at a new address each time
    print(model)

    # Train the model on the training data for this fold
    model.fit(x_train, y_train, epochs=10, batch_size=128)

print('\n\n Good: accuracy resets at the start of each fold')
# Fit the model k times using k-fold cross validation
for train_index, test_index in KFold(5, shuffle=True).split(x):
    # Get the train and test sets for this fold
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Model needs to be re-compiled for each fold so that we are starting over fresh
    model = create_model_works()

    # Confirm the model is at a new address each time
    print(model)

    # Train the model on the training data for this fold
    model.fit(x_train, y_train, epochs=10, batch_size=128)

Other info / logs
Example of output when run on my dataset:

Bad: accuracy trends upward from fold to fold
2020-03-07 15:35:56.079018: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-07 15:35:56.100415: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7ffa9d1c10c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-07 15:35:56.100477: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
<tensorflow.python.keras.engine.sequential.Sequential object at 0x136bf3150>
Train on 12096 samples
Epoch 1/10
12096/12096 [==============================] - 0s 39us/sample - loss: 1.9799 - accuracy: 0.1536
Epoch 2/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9412 - accuracy: 0.1732
Epoch 3/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9397 - accuracy: 0.1932
Epoch 4/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9387 - accuracy: 0.2163
Epoch 5/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9374 - accuracy: 0.2216
Epoch 6/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9362 - accuracy: 0.2464
Epoch 7/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9349 - accuracy: 0.2319
Epoch 8/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9335 - accuracy: 0.2392
Epoch 9/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9321 - accuracy: 0.2759
Epoch 10/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9308 - accuracy: 0.2449
<tensorflow.python.keras.engine.sequential.Sequential object at 0x12a35d590>
Train on 12096 samples
Epoch 1/10
12096/12096 [==============================] - 0s 36us/sample - loss: 1.9294 - accuracy: 0.2710
Epoch 2/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9279 - accuracy: 0.3118
Epoch 3/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9262 - accuracy: 0.2827
Epoch 4/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9248 - accuracy: 0.2997
Epoch 5/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9232 - accuracy: 0.2897
Epoch 6/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9213 - accuracy: 0.2981
Epoch 7/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9196 - accuracy: 0.3253
Epoch 8/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9177 - accuracy: 0.3144
Epoch 9/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9159 - accuracy: 0.3027
Epoch 10/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9138 - accuracy: 0.3558
<tensorflow.python.keras.engine.sequential.Sequential object at 0x138ee19d0>
Train on 12096 samples
Epoch 1/10
12096/12096 [==============================] - 1s 42us/sample - loss: 1.9116 - accuracy: 0.3416
Epoch 2/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9095 - accuracy: 0.3471
Epoch 3/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9072 - accuracy: 0.3569
Epoch 4/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9044 - accuracy: 0.3348
Epoch 5/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9021 - accuracy: 0.3718
Epoch 6/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.8993 - accuracy: 0.3667
Epoch 7/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.8965 - accuracy: 0.3576
Epoch 8/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.8935 - accuracy: 0.3866
Epoch 9/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.8903 - accuracy: 0.3864
Epoch 10/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.8869 - accuracy: 0.3803
<tensorflow.python.keras.engine.sequential.Sequential object at 0x139137e50>
Train on 12096 samples
Epoch 1/10
12096/12096 [==============================] - 0s 36us/sample - loss: 1.8833 - accuracy: 0.3860
Epoch 2/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.8793 - accuracy: 0.3811
Epoch 3/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.8752 - accuracy: 0.3779
Epoch 4/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.8710 - accuracy: 0.3597
Epoch 5/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.8665 - accuracy: 0.3774
Epoch 6/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.8617 - accuracy: 0.3658
Epoch 7/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.8565 - accuracy: 0.3595
Epoch 8/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.8512 - accuracy: 0.3759
Epoch 9/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.8454 - accuracy: 0.3759
Epoch 10/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.8392 - accuracy: 0.4032
<tensorflow.python.keras.engine.sequential.Sequential object at 0x12a3bef50>
Train on 12096 samples
Epoch 1/10
12096/12096 [==============================] - 0s 36us/sample - loss: 1.8334 - accuracy: 0.4120
Epoch 2/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.8265 - accuracy: 0.4043
Epoch 3/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.8194 - accuracy: 0.4138
Epoch 4/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.8116 - accuracy: 0.4161
Epoch 5/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.8036 - accuracy: 0.3838
Epoch 6/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.7952 - accuracy: 0.4004
Epoch 7/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.7865 - accuracy: 0.3979
Epoch 8/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.7772 - accuracy: 0.3853
Epoch 9/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.7674 - accuracy: 0.4128
Epoch 10/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.7574 - accuracy: 0.4184

 Good: accuracy resets at the start of each fold
<tensorflow.python.keras.engine.sequential.Sequential object at 0x13987e690>
Train on 12096 samples
Epoch 1/10
12096/12096 [==============================] - 0s 36us/sample - loss: 2.0619 - accuracy: 0.1512
Epoch 2/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9440 - accuracy: 0.1794
Epoch 3/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9427 - accuracy: 0.1803
Epoch 4/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9416 - accuracy: 0.1951
Epoch 5/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9404 - accuracy: 0.1954
Epoch 6/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9393 - accuracy: 0.2612
Epoch 7/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9380 - accuracy: 0.2625
Epoch 8/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9367 - accuracy: 0.2180
Epoch 9/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9356 - accuracy: 0.2625
Epoch 10/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9344 - accuracy: 0.2537
<tensorflow.python.keras.engine.sequential.Sequential object at 0x13a512e50>
Train on 12096 samples
Epoch 1/10
12096/12096 [==============================] - 0s 36us/sample - loss: 1.9892 - accuracy: 0.1556
Epoch 2/10
12096/12096 [==============================] - 0s 16us/sample - loss: 1.9438 - accuracy: 0.2013
Epoch 3/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9422 - accuracy: 0.1827
Epoch 4/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9413 - accuracy: 0.1744
Epoch 5/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9402 - accuracy: 0.1950
Epoch 6/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9389 - accuracy: 0.2173
Epoch 7/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9379 - accuracy: 0.2027
Epoch 8/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9365 - accuracy: 0.2434
Epoch 9/10
12096/12096 [==============================] - 0s 14us/sample - loss: 1.9354 - accuracy: 0.2428
Epoch 10/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9340 - accuracy: 0.2522
<tensorflow.python.keras.engine.sequential.Sequential object at 0x13ac94910>
Train on 12096 samples
Epoch 1/10
12096/12096 [==============================] - 0s 36us/sample - loss: 1.9717 - accuracy: 0.1385
Epoch 2/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9456 - accuracy: 0.1362
Epoch 3/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9441 - accuracy: 0.1475
Epoch 4/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9430 - accuracy: 0.1448
Epoch 5/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9416 - accuracy: 0.1685
Epoch 6/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9402 - accuracy: 0.1749
Epoch 7/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9390 - accuracy: 0.1855
Epoch 8/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9376 - accuracy: 0.2207
Epoch 9/10
12096/12096 [==============================] - 0s 17us/sample - loss: 1.9361 - accuracy: 0.2469
Epoch 10/10
12096/12096 [==============================] - 0s 17us/sample - loss: 1.9347 - accuracy: 0.2190
<tensorflow.python.keras.engine.sequential.Sequential object at 0x13b416410>
Train on 12096 samples
Epoch 1/10
12096/12096 [==============================] - 1s 52us/sample - loss: 1.9773 - accuracy: 0.1500
Epoch 2/10
12096/12096 [==============================] - 0s 16us/sample - loss: 1.9466 - accuracy: 0.1562
Epoch 3/10
12096/12096 [==============================] - 0s 29us/sample - loss: 1.9454 - accuracy: 0.1520
Epoch 4/10
12096/12096 [==============================] - 0s 19us/sample - loss: 1.9442 - accuracy: 0.1464
Epoch 5/10
12096/12096 [==============================] - 0s 18us/sample - loss: 1.9432 - accuracy: 0.1689
Epoch 6/10
12096/12096 [==============================] - 0s 17us/sample - loss: 1.9422 - accuracy: 0.1894
Epoch 7/10
12096/12096 [==============================] - 0s 18us/sample - loss: 1.9411 - accuracy: 0.1769
Epoch 8/10
12096/12096 [==============================] - 0s 16us/sample - loss: 1.9399 - accuracy: 0.1979
Epoch 9/10
12096/12096 [==============================] - 0s 16us/sample - loss: 1.9388 - accuracy: 0.2088
Epoch 10/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9377 - accuracy: 0.2134
<tensorflow.python.keras.engine.sequential.Sequential object at 0x13bb90e10>
Train on 12096 samples
Epoch 1/10
12096/12096 [==============================] - 0s 39us/sample - loss: 2.0052 - accuracy: 0.1426
Epoch 2/10
12096/12096 [==============================] - 0s 15us/sample - loss: 1.9465 - accuracy: 0.1474
Epoch 3/10
12096/12096 [==============================] - 0s 19us/sample - loss: 1.9450 - accuracy: 0.1523
Epoch 4/10
12096/12096 [==============================] - 0s 17us/sample - loss: 1.9439 - accuracy: 0.1675
Epoch 5/10
12096/12096 [==============================] - 0s 18us/sample - loss: 1.9425 - accuracy: 0.1751
Epoch 6/10
12096/12096 [==============================] - 0s 16us/sample - loss: 1.9413 - accuracy: 0.1966
Epoch 7/10
12096/12096 [==============================] - 0s 31us/sample - loss: 1.9400 - accuracy: 0.2121
Epoch 8/10
12096/12096 [==============================] - 0s 22us/sample - loss: 1.9385 - accuracy: 0.2051
Epoch 9/10
12096/12096 [==============================] - 0s 17us/sample - loss: 1.9374 - accuracy: 0.2239
Epoch 10/10
12096/12096 [==============================] - 0s 17us/sample - loss: 1.9360 - accuracy: 0.2332

fernandonieuwveldt commented 4 years ago

@emlys This behaviour seems fine. The list architecture you passing contains initialized layers outside of the function.

Here is an example:

In [42]: architecture = [keras.layers.Dense(128, input_dim=55, activation='sigmoid')]                                         

In [43]: def f(arc=None): 
    ...:     print(arc) 

In [44]: f(architecture)                                                                                                      
[<tensorflow.python.keras.layers.core.Dense object at 0x7fe9f2144b50>]

In [45]: f(architecture)                                                                                                      
[<tensorflow.python.keras.layers.core.Dense object at 0x7fe9f2144b50>] # same object as above

In [46]: f([keras.layers.Dense(128, input_dim=55, activation='sigmoid')])                                                     
[<tensorflow.python.keras.layers.core.Dense object at 0x7fe9f2127850>]

In [47]: f([keras.layers.Dense(128, input_dim=55, activation='sigmoid')])                                                     
[<tensorflow.python.keras.layers.core.Dense object at 0x7fe9f2446810>] # different since it gets initialized when the print function is called.

Saduf2019 commented 3 years ago

@emlys Moving this issue to closed status as there has been no activity, in case you still face the error please create a new issue.

keras-team / keras

information from training stored in layers not in model #13872