While retraining a model I get worse values for MSE from the last epoch of last trainment to the first of the next

Dtar380 commented 5 months ago

Im using a Sequential model to make a model based on regresion.

This is the code im using for training:

def massTrain(self):

        csvs = listdir('Data')

        for i ,csv in enumerate(csvs):

            print(f'\nTraining with {csv}\n')

            if i < len(csvs) - 1:
                self.setData(csv, 1)
            else:
                self.setData(csv, 0.9)

            if (test_data):
                self.predictions = self.test(test_data)

                y_test = self.dataset[self.training_data_len:, :]

                self.mse = mean_squared_error(y_test, self.predictions)
                self.rmse = sqrt(self.mse)

                print(f"\nRMSE was: {self.rmse}\n")

def train(self, train_data):

        # Split data into x_train and y_train data sets
        x_train = []
        y_train = []

        for i in range(self.input_shape, len(train_data)):
            x_train.append(train_data[i - self.input_shape:i, 0])
            y_train.append(train_data[i, 0])

        # Convert x_train and y_train to NP arrays
        x_train,y_train = np.array(x_train), np.array(y_train)

        # Reshape the data
        x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))

        # Train the model
        earlystopping = callbacks.EarlyStopping(monitor="loss",mode="min",patience=6,restore_best_weights=True)

        self.model.fit(x_train , y_train, batch_size=16, epochs=50, callbacks=[earlystopping])

The rest of the code is irrelevant because its able to train and save and plot graphs after training.

Im using 10 csv files that holds data, so I iterate over them using listdir() to get all the files, I then train the model using all those files, for the last train dataset I just use 90% of the data for then testing the model with the other 10% and ploting a graph.

What im getting is that for example, when training the modle with the first dataset, on the last epoch of the trainment I get a MSE of 7e-4, and then in the, when using the next dataset I get on the first epoch 0.0012, which is a lot more actually, 5e-4 more, that is a 58% less acuracy.

Is there something im doing wrong when retraining the model, because the only thing I think it could be is that the weights are not being stored after fitting the model and its starting from scratch every time, and therefore all the fitting is useless.

Dtar380 commented 5 months ago

I updated the code because it wasnt right, I took it from my test file on accident and it was not the one Im running, mainly beacuse it wouldnt work.

Now, to add some usefull data, heres the compilation of the model:

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error', metrics=[RootMeanSquaredError()])

And here is the REAL data from the training Im doing right now and as im using a callback Earlystopper that retakes the best value Im going to include 3 rows for first dataset training.

First dataset training:

First epoch result: 448/448 [==============================] - 36s 69ms/step - loss: 0.0012 - root_mean_squared_error: 0.0339
Last epoch result: 448/448 [==============================] - 31s 69ms/step - loss: 2.3723e-04 - root_mean_squared_error: 0.0154
Best epoch result: 448/448 [==============================] - 30s 67ms/step - loss: 2.3116e-04 - root_mean_squared_error: 0.0152

Next dataset training:
First epoch result: 448/448 [==============================] - 31s 69ms/step - loss: 5.3316e-04 - root_mean_squared_error: 0.0231

divyashreepathihalli commented 4 months ago

@Dtar380 I have a few questions, it would help me understand your issue better

how and where is the train function being called?
This would make sure the model is not getting reinitialized in a loop somewhere
Where is model.compile() being done? you might be unintentionally resetting the weights if you are compiling somewhere in a loop
Is your data normalized? I am wondering if the data looks different in the first CSV file and the last

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

google-ml-butler[bot] commented 3 months ago

Are you satisfied with the resolution of your issue? Yes No

keras-team / keras

While retraining a model I get worse values for MSE from the last epoch of last trainment to the first of the next #19813