Closed Kamal-Moha closed 10 months ago
Hi @Kamal-Moha, for this (e.g. storing the model on some device and then loading it on another) you should use RNNModel.load_weights()
. You can find the docs here.
For this to work on the new device you need to first recreate the model the same way as done on device where you saved the model:
model = RNNModel(
input_chunk_length=n_params['input_chunk_length'],
model=n_params['Model'],
hidden_dim=20,
dropout=n_params['dropout'],
batch_size=n_params['batch_size'],
n_epochs=300,
optimizer_kwargs={"lr": n_params['lr']},
model_name="Karachi_RNN",
pl_trainer_kwargs=pl_trainer_kwargs,
# log_tensorboard=True,
random_state=42,
training_length=n_params['training_length'],
force_reset=True,
save_checkpoints=True
)
Then you can load the model like this:
model.load_weights("evapotranspiration_model.pt")
Also, make sure to download both files (the one ending on ".pt" and the one ending on ".pt.ckpt") and have them in the same directory.
It still doesn't work @dennisbader even after the suggestions given.
But I just don't get it. Why do I have to re-create the model structure again, I have already created & trained my model and happy with the evaluation metrics. So I now want to save the model and use it for deployment purposes. I'm not sure of having a lot of training code appear again in the deployment notebook.
I have been using pickle
to do that when doing sklearn
models and it has been working fine. darts
seems to make the process of model deployment really stressful
Having both the files '.pt' and the '.pt.ckpt' in the same directory have at least helped to remove the error my code was earlier producing. But when I try to make prediction, it predicts nan
values in all which is wrong. Check the code below.
path = '/content/drive/MyDrive/Omdena Projects/Weather Prediction for Pakistan/'
json_data = {
"data_columns" : "weathercode,temperature_2m_max,temperature_2m_min,temperature_2m_mean,apparent_temperature_max,apparent_temperature_min,apparent_temperature_mean,sunrise,sunset,shortwave_radiation_sum,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,et0_fao_evapotranspiration",
"evo_model" : f"{path}evapotranspiration_model.pt",
}
# Load evo_model
model = RNNModel.load(json_data['evo_model'])
data_columns = json_data['data_columns']
now = datetime.now() - relativedelta(days=7)
start = now - relativedelta(months=11)
date_string_end = now.strftime('%Y-%m-%d')
date_string_start = start.strftime('%Y-%m-%d')
date_pred = []
for date in pd.date_range(start=datetime.now() - relativedelta(days=6), periods=10):
date_pred.append(date.strftime('%Y-%m-%d'))
url = "https://archive-api.open-meteo.com/v1/archive"
cities = [
{ "name": "Karachi", "country": "Pakistan", "latitude": 24.8608, "longitude": 67.0104 }
]
cities_df =[]
for city in cities:
params = {"latitude":city["latitude"],
"longitude":city['longitude'],
"start_date": date_string_start,
"end_date": date_string_end,
"daily": data_columns,
"timezone": "GMT",
"min": date_string_start,
"max": date_string_end,
}
res = requests.get(url, params=params)
data = res.json()
df = pd.DataFrame(data["daily"])
df["latitude"] = data["latitude"]
df["longitude"] = data["longitude"]
df["elevation"] = data["elevation"]
df["country"] = city["country"]
df["city"] = city["name"]
cities_df.append(df)
concat_df = pd.concat(cities_df, ignore_index=True)
concat_df.set_index('time', inplace=True)
total_hours = concat_df['precipitation_hours'].sum()
concat_df['precipitation_rate'] = concat_df['precipitation_sum']/total_hours
##generate prediction for evo_transpiration
et0_fao_evapotranspiration = TimeSeries.from_series(concat_df['et0_fao_evapotranspiration'].values)
scaler = StandardScaler()
transformer = Scaler(scaler)
series_transformed = transformer.fit_transform(et0_fao_evapotranspiration)
model.fit(series=series_transformed, verbose=0)
print(model.predict(10))
First kindly tell me if doing it this way is the correct way to use a saved model during deployment. And if yes, explain why my model is predicting nan
values in all.
Your help is highly appreciated @dennisbader
Hi @Kamal-Moha,
I tried reproducing your problem with the following code snippet:
from darts.models import RNNModel
from datetime import datetime
from dateutil import relativedelta
import pandas as pd
from darts import TimeSeries
from darts.dataprocessing.transformers import Scaler
from sklearn.preprocessing import StandardScaler
from darts.datasets import AirPassengersDataset
import requests
ts = AirPassengersDataset().load().astype("float32")
model_old = RNNModel(6, n_epochs=3, training_length=4)
model_old.fit(ts)
model_old.save("ckpt_name.pt")
then created a folder named ckpt_folder
in the directory at the level of the folder containing the notebook and cut-pasted the .pt
and .ckpt
files into this directory. I can then load the weights and perform inference in another cell (possibly another notebook) with the following:
# Load evo_model
model = RNNModel.load("../ckpt_folder/ckpt_name.pt")
json_data = {
"data_columns" : "weathercode,temperature_2m_max,temperature_2m_min,temperature_2m_mean,apparent_temperature_max,apparent_temperature_min,apparent_temperature_mean,sunrise,sunset,shortwave_radiation_sum,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,windspeed_10m_max,windgusts_10m_max,winddirection_10m_dominant,et0_fao_evapotranspiration",
}
data_columns = json_data['data_columns']
now = datetime.now() - relativedelta.relativedelta(days=7)
start = now - relativedelta.relativedelta(months=11)
date_string_end = now.strftime('%Y-%m-%d')
date_string_start = start.strftime('%Y-%m-%d')
date_pred = []
for date in pd.date_range(start=datetime.now() - relativedelta.relativedelta(days=6), periods=10):
date_pred.append(date.strftime('%Y-%m-%d'))
url = "https://archive-api.open-meteo.com/v1/archive"
cities = [
{ "name": "Karachi", "country": "Pakistan", "latitude": 24.8608, "longitude": 67.0104 }
]
cities_df =[]
for city in cities:
params = {"latitude":city["latitude"],
"longitude":city['longitude'],
"start_date": date_string_start,
"end_date": date_string_end,
"daily": data_columns,
"timezone": "GMT",
"min": date_string_start,
"max": date_string_end,
}
res = requests.get(url, params=params)
data = res.json()
df = pd.DataFrame(data["daily"])
df["latitude"] = data["latitude"]
df["longitude"] = data["longitude"]
df["elevation"] = data["elevation"]
df["country"] = city["country"]
df["city"] = city["name"]
cities_df.append(df)
concat_df = pd.concat(cities_df, ignore_index=True)
concat_df.set_index('time', inplace=True)
total_hours = concat_df['precipitation_hours'].sum()
concat_df['precipitation_rate'] = concat_df['precipitation_sum']/total_hours
##generate prediction for evo_transpiration
et0_fao_evapotranspiration = TimeSeries.from_series(concat_df['et0_fao_evapotranspiration'].values).astype("float32")
scaler = StandardScaler()
transformer = Scaler(scaler)
series_transformed = transformer.fit_transform(et0_fao_evapotranspiration)
model.fit(series=series_transformed, verbose=0)
print(model.predict(10))
Which part of the process to you find counter-intuitive or unclear?
NaN
in the forecast are often due to NaN
in the training (or inference) dataset, make sure that after converting your data into series, there are no NaN
in them (which can happen if dates are missing for example).
Describe the bug I can't fit and use a saved model after loading it. I get the error "FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/working/darts_logs/Karachi_RNN/_model.pth.tar'" when I try to fit it on new data.
To Reproduce This is how I created the darts RNNModel
I have then downloaded this model so that I can use it in a new notebook.
loading the model
evo_model = RNNModel.load('evapotranspiration_model.pt')
Trying to fit new data using the saved model and then make predictions.
I get the error "FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/working/darts_logs/Karachi_RNN/_model.pth.tar'" when it tries to execute the line
evo_model.fit(series=series_transformed, verbose=0)
I don't understand why its saying FileNotFound because I have already downloaded the 'evapotranspiration_model.pt' model in my computer.
Expected behavior I expected it to execute without any error and make predictions because I have properly saved & loaded the RNNModel. Please help
System (please complete the following information):
Additional context Add any other context about the problem here.