gzerveas / mvts_transformer

Multivariate Time Series Transformer, public version
MIT License
718 stars 169 forks source link

Extracting Imputed values ? #45

Closed watersoup closed 1 year ago

watersoup commented 1 year ago

Hi,

My heartiest congratulation on implementing this wonderful work on the transformers. I have also added "WasteWaterClass" to the my branch with new data.py.

It would be great if anybody can tell me how to extract the imputed values after doing unsupervised training using masking.

Thanks Jag

gzerveas commented 1 year ago

Hello Jag, thank you very much for the kind words and your contribution! Please submit a pull request if you want your branch to be included.

So, how to extract the imputed values? If you have a look at the output experiment directory, you will notice that the predicted values for your task (including imputation) on the dataset designated for validation are actually already exported under <exp_name>/predictions/best_predictions.npz. Here is some example code on how you can get them:

root_dir = "/Users/gzerveas/experiments/"
predictions_filepath = os.path.join(root_dir, "MyExp/predictions/best_predictions.npz")
assert os.path.isfile(predictions_filepath), "Specify existing file"

results = np.load(predictions_filepath, allow_pickle=True)

# N: number of samples in dataset, T: fixed length of time series, F: number of features (variables)
predictions = np.concatenate(results["predictions"], axis=0)  # (N, T, F) array of predictions
targets = np.concatenate(results["targets"], axis=0)  # (N, T, F) array of actual time series values (ground truth, "y")
target_masks = np.concatenate(results["target_masks"], axis=0)  # (N, T, F) array of target masks (0s indicate which values were masked and had to be predicted)
IDs = np.concatenate(results["IDs"])

print(predictions.shape)
N, T, F = predictions.shape
assert targets.shape == predictions.shape
assert target_masks.shape == targets.shape
assert IDs.shape[0] == targets.shape[0]

One thing to note is that the predicted and target values here are captured after they have been normalized. So if you wish to restore them to their original value range, you will need to use the normalization values stored in MyExp/normalization.pickle to revert the normalization (depending on what normalization method you specified in the options).

I think I will maybe add a jupyter notebook that plots the imputed values, like the images in the paper.

watersoup commented 1 year ago

Hello Jag, thank you very much for the kind words and your contribution! Please submit a pull request if you want your branch to be included.

So, how to extract the imputed values? If you have a look at the output experiment directory, you will notice that the predicted values for your task (including imputation) on the dataset designated for validation are actually already exported under <exp_name>/predictions/best_predictions.npz. Here is some example code on how you can get them:

root_dir = "/Users/gzerveas/experiments/"
predictions_filepath = os.path.join(root_dir, "MyExp/predictions/best_predictions.npz")
assert os.path.isfile(predictions_filepath), "Specify existing file"

results = np.load(predictions_filepath, allow_pickle=True)

# N: number of samples in dataset, T: fixed length of time series, F: number of features (variables)
predictions = np.concatenate(results["predictions"], axis=0)  # (N, T, F) array of predictions
targets = np.concatenate(results["targets"], axis=0)  # (N, T, F) array of actual time series values (ground truth, "y")
target_masks = np.concatenate(results["target_masks"], axis=0)  # (N, T, F) array of target masks (0s indicate which values were masked and had to be predicted)
IDs = np.concatenate(results["IDs"])

print(predictions.shape)
N, T, F = predictions.shape
assert targets.shape == predictions.shape
assert target_masks.shape == targets.shape
assert IDs.shape[0] == targets.shape[0]

One thing to note is that the predicted and target values here are captured after they have been normalized. So if you wish to restore them to their original value range, you will need to use the normalization values stored in MyExp/normalization.pickle to revert the normalization (depending on what normalization method you specified in the options).

I think I will maybe add a jupyter notebook that plots the imputed values, like the images in the paper.

Hi George,

Thats really great, I am aware of the reverse-normalization, I have added that function in the new data.py. it will be just a call away if the columns-names of the data can be left as the original.

Yes many thanks in advance for putting an example on plotting the imputed values. Thanks, Jag