R² are too slow when use this model to predict

I have a dataset which describes water level of river，looks like this： And then I process data like this：

    res_df = pd.read_csv(my_table)
    t = res_df['TM']    
    y = res_df['Z']
    t_train, t_test, y_train, y_test = train_test_split(t, y, test_size=0.25)
    # set size of input/output windows
    iw = 72
    ow = 72
    s = 12
    # generate windowed training/test datasets
    Xtrain, Ytrain = generate_dataset.windowed_dataset(y_train.to_numpy().reshape(-1, 1),
                                                       input_window=iw, output_window=ow, stride=s)
    Xtest, Ytest = generate_dataset.windowed_dataset(y_test.to_numpy().reshape(-1, 1),
                                                     input_window=iw, output_window=ow, stride=s)
    X_train, Y_train, X_test, Y_test = generate_dataset.numpy_to_torch(Xtrain, Ytrain, Xtest, Ytest)

You can see I want to predict 72 water levels in 72 hours from last 72 hours, and dim of X_train/Y_train is (72, 1259, 1), X_test/Y_test is (72, 412, 1) . Then I use these codes to test the model:

model = LstmSeq2seq(input_size=X_train.shape[2], hidden_size=20)
device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")
model.to(device)
loss = model.train_model(X_train, Y_train, n_epochs=50, target_len=ow, batch_size=4, training_prediction='mixed_teacher_forcing',
                                 teacher_forcing_ratio=0.6, learning_rate=0.006, dynamic_tf=False)
# hidden_size = 20
np.save(loss_path, loss, allow_pickle=True)
torch.save(model, model_path)
for sample in range(0, Y_test.shape[1]):
    X_slice = X_test[:, sample, :]
    Y_slice = Y_test[:, sample, :]
    rmse = sqrt(sklearn.metrics.mean_squared_error(Y_slice.cpu().numpy(), model.predict(X_slice,
                    target_len=Y_test.shape[0])))
    r2 = sklearn.metrics.r2_score(Y_slice.cpu().numpy(), model.predict(X_slice, target_len=Y_test.shape[0]))
    print(rmse, ',', r2)

However 412 outputs of slices look like this:

0.44626987 , -0.5325643609308734
0.4078824 , -0.27453414173048074
0.43178117 , -0.16933621375610097
0.47949684 , -0.3551825689227257
0.474012 , -0.4368760433523293
0.50186276 , -0.4779415136088816
0.47483996 , -0.49918125915299805
0.43614775 , -0.4350776345576517
0.41064182 , -0.5490551658763032
0.40776572 , -0.3227591162474419
0.41470525 , -0.4134024058370649
0.37772563 , -0.10867208658414595
0.5142976 , -0.25404413112346536
……

You can see almost all R² of 412 slices is smaller than 0, so it's a bad model. I have adjusted parameters of LSTM for serveral times, but I don't get a ideal result. Do I have any mistake? If you are here, please answer me.

lkulowski / LSTM_encoder_decoder

R² are too slow when use this model to predict #8