Open YikChingTsui opened 4 months ago
Thank you for your issue, we will fix the error soon.
Sorry, fix 1 should actually be after saving the train dataset, because the save
method requires the train_dataset
folder to not exist previously. train_dataset.reference
should also be saved directly to match the shape
# ...
# remove train_dataset if it exists
train_dataset.save("./train_dataset")
train_dataset.reference.to_csv('./train_dataset/reference.csv', index=False)
Have you solved this problem, I also found that the saved model reloading does not work, will prompt the training set is not available
Have you solved this problem, I also found that the saved model reloading does not work, will prompt the training set is not available
Hello, under what circumstances reloading the model will report an error?
Here's a demo (in replication1_load.py and replication2_load.py) for reload the model without error, maybe it will help.
Have you solved this problem, I also found that the saved model reloading does not work, will prompt the training set is not available
Hello, under what circumstances reloading the model will report an error?
Here's a demo (in replication1_load.py and replication2_load.py) for reload the model without error, maybe it will help.
gnnwr.reg_result('./ceshi/textresult/GNNWR_PM25_Result.csv') train_dataset.save('./ceshi/textresult/gnnwr_datasets/train_dataset') val_dataset.save('./ceshi/textresult/gnnwr_datasets/val_dataset') test_dataset.save('./ceshi/textresult/gnnwr_datasets/test_dataset') train_dataset_load = datasets.load_dataset('./demo_result/gnnwr_datasets/train_dataset/') val_dataset_load = datasets.load_dataset('./demo_result/gnnwr_datasets/val_dataset/') test_dataset_load = datasets.load_dataset('./demo_result/gnnwr_datasets/test_dataset/') pred_data = pd.read_csv(u'C:/Users/lenovo/Desktop/gnnwr-0.1.5/data/pm25_predict_data.csv') gnnwr_load = models.GNNWR(train_dataset = train_dataset_load, valid_dataset = val_dataset_load, test_dataset = test_dataset_load, dense_layers = [512,256,64,128], start_lr = 0.2, optimizer = "Adadelta", activate_func = nn.PReLU(init=0.1), model_name = " ceshi_GNNWR_PM25", model_save_path = "./ceshi/textresult", log_path = "./ceshi/textresult/gnnwr_logs", write_path = "./ceshi/textresult/gnnwr_runs" ) gnnwr_load.load_model('./ceshi/textresult/ ceshi_GNNWR_PM25.pkl')
then init_predict_dataset doesn't work if loading training dataset from file
pred_dataset = datasets.init_predict_dataset(data = pred_data,train_dataset = train_dataset_load,x_column=['dem', 'w10','d10','t2m','aod_sat','tp'],spatial_column=['经度','纬度'])
Have you solved this problem, I also found that the saved model reloading does not work, will prompt the training set is not available
Hello, under what circumstances reloading the model will report an error?
Here's a demo (in replication1_load.py and replication2_load.py) for reload the model without error, maybe it will help.
thanks you, I will try your demo in my spare time
Background
After training the model, I want to save the training dataset to the filesystem. Then I want to run another script to load the training dataset and model to predict values. After all, if everything can be saved after training, it implies they can be loaded in another script for prediction.
(The example in this repo, and the Estimating PM2.5 Concentrations example, puts everything in one file. They use the original training dataset for
datasets.init_predict_dataset
, i.e., the one in memory and not the one loaded from the filesystem.)Problem 1
When I tried to load the training dataset, and use it for
init_predict_dataset
, it would fail with something like thereference
attribute is missing ontrain_dataset_load
. The code is something like this:Fix
Adding this code after saving the training dataset would save the
reference
to a file:Then, in
predict.py
, after loading the training dataset, addreference
back to the training dataset:Problem 2 and fix
After that fix,
init_predict_dataset
still fails atx = (x - min) / (max - min)
(link): cannot subtract (-
) between two lists.The cause is that
train_dataset.distances_scale_param['min']
andtrain_dataset.distances_scale_param['max']
were originallynp.array
s, but it was converted into a Pythonlist
when saved. When the training dataset is loaded, they remained aslist
s.The solution is to convert the lists to
np.array
after loading the training dataset inpredict.py
:Library fix
To move this fix to the library, the
save()
method should be modified to save thereference
as well. Theread()
method should read the file where thereference
was saved and set the attribute. But I'm not sure if this would break something else. The fixes above are for the users of the library so they can add the fixes for the training dataset only.This part should also be changed to convert the lists to
np.array
. This shouldn't affect anything elsehttps://github.com/zjuwss/gnnwr/blob/2a6ad0f034ae799367b3594e0adb601fae98ddbd/src/gnnwr/datasets.py#L268