hazdzz / STGCN

The PyTorch implementation of STGCN.
GNU Lesser General Public License v2.1
484 stars 106 forks source link

Weird results on raw PEMS-BAY/METR-LA data #12

Closed JingweiZuo closed 3 years ago

JingweiZuo commented 3 years ago

Hello,

Thanks for sharing the Pytorch version of STGCN. It was a great help for us.

I see that you released your own version of PEMS-BAY/METR-LA data which is different from the original version provided by DCRNN's author (e.g., the adjacency matrix), I would like to know that did you do any pre-processing on the raw data?

When we apply the raw dataset (sensor values + adjacency matrix) provided by DCRNN's author, the training loss of your implementation is quite weird. However, when we adopted your version, the training process went back to normal.

Besides, in your code, I see that you didn't consider the missing values in the metric calculation, which may cause problems for PEMS-BAY/METR-LA data.

Thanks in advance for your reply.


Validation loss decreased (inf --> 0.000019). Saving model ... Epoch: 001 | Lr: 0.00092770867339000149 |Train loss: 0.000395 | Val loss: 0.000019 | GPU occupy: 629.645824 MiB 100%|

Validation loss decreased (0.000019 --> 0.000018). Saving model ... Epoch: 002 | Lr: 0.00086064338268303632 |Train loss: 0.000039 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%| Validation loss decreased (0.000018 --> 0.000018). Saving model ... Epoch: 003 | Lr: 0.00079842633081076287 |Train loss: 0.000037 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%| Validation loss decreased (0.000018 --> 0.000018). Saving model ... Epoch: 004 | Lr: 0.00074070703215609989 |Train loss: 0.000037 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%| Validation loss decreased (0.000018 --> 0.000018). Saving model ... Epoch: 005 | Lr: 0.00068716033817218007 |Train loss: 0.000038 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%| Early stopping counter: 1 out of 30 Epoch: 006 | Lr: 0.00063748460573193771 |Train loss: 0.000038 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%| Early stopping counter: 2 out of 30 Epoch: 007 | Lr: 0.00059139999789012399 |Train loss: 0.000037 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%|

hazdzz commented 3 years ago

Hello,

Thanks for sharing the Pytorch version of STGCN. It was a great help for us.

I see that you released your own version of PEMS-BAY/METR-LA data which is different from the original version provided by DCRNN's author (e.g., the adjacency matrix), I would like to know that did you do any pre-processing on the raw data?

When we apply the raw dataset (sensor values + adjacency matrix) provided by DCRNN's author, the training loss of your implementation is quite weird. However, when we adopted your version, the training process went back to normal.

Besides, in your code, I see that you didn't consider the missing values in the metric calculation, which may cause problems for PEMS-BAY/METR-LA data.

Thanks in advance for your reply.

Validation loss decreased (inf --> 0.000019). Saving model ... Epoch: 001 | Lr: 0.00092770867339000149 |Train loss: 0.000395 | Val loss: 0.000019 | GPU occupy: 629.645824 MiB 100%|

Validation loss decreased (0.000019 --> 0.000018). Saving model ... Epoch: 002 | Lr: 0.00086064338268303632 |Train loss: 0.000039 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%| Validation loss decreased (0.000018 --> 0.000018). Saving model ... Epoch: 003 | Lr: 0.00079842633081076287 |Train loss: 0.000037 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%| Validation loss decreased (0.000018 --> 0.000018). Saving model ... Epoch: 004 | Lr: 0.00074070703215609989 |Train loss: 0.000037 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%| Validation loss decreased (0.000018 --> 0.000018). Saving model ... Epoch: 005 | Lr: 0.00068716033817218007 |Train loss: 0.000038 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%| Early stopping counter: 1 out of 30 Epoch: 006 | Lr: 0.00063748460573193771 |Train loss: 0.000038 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%| Early stopping counter: 2 out of 30 Epoch: 007 | Lr: 0.00059139999789012399 |Train loss: 0.000037 | Val loss: 0.000018 | GPU occupy: 629.645824 MiB 100%|

First, I used the dataset that the DCRNN author provided, and I put it in here: https://github.com/hazdzz/Road_Traffic_Flow_datasets_from_DCRNN Second, I used the data preprocessing method that the ChebyNet author provided. Third, the dataset preprocessing method by DCRNN is not suitable for STGCN, because STGCN using ChebyNet or GCN as GNN backbone, it is unacceptable for directed graphs though the STGCN paper does not mention it. In graph signal processing theory, a directed graph is unacceptable for the Combinatorial Laplacian matrix, unless using some approaches. Fourth, because I found using MAPE would show weird results, so I used WMAPE.

JingweiZuo commented 3 years ago

First, I used the dataset that the DCRNN author provided, and I put it in here: https://github.com/hazdzz/Road_Traffic_Flow_datasets_from_DCRNN Second, I used the data preprocessing method that the ChebyNet author provided. Third, the dataset preprocessing method by DCRNN is not suitable for STGCN, because STGCN using ChebyNet or GCN as GNN backbone, it is unacceptable for directed graphs though the STGCN paper did not mention it. In graph signal processing theory, a directed graph is unacceptable for the Combinatorial Laplacian matrix, unless using some approaches. Fourth, because I found using MAPE would show weird results, so I used WMAPE.

Hello,

Many thanks for the prompt reply and the slides' sharing. It's much clearer now. Shameful for missing the important detail.

It would be great to release the script for generating the adjacency matrix, as people usually start from the raw data file "sensor_distance/sensor_ids" for different datasets.

Besides, the weird results using MAPE are most probably caused by the missing values (set to 0 by default). Previously I tested the masked_mape after obtaining the weird results using MAPE, which addressed the problem.

hazdzz commented 3 years ago

First, I used the dataset that the DCRNN author provided, and I put it in here: https://github.com/hazdzz/Road_Traffic_Flow_datasets_from_DCRNN Second, I used the data preprocessing method that the ChebyNet author provided. Third, the dataset preprocessing method by DCRNN is not suitable for STGCN, because STGCN using ChebyNet or GCN as GNN backbone, it is unacceptable for directed graphs though the STGCN paper did not mention it. In graph signal processing theory, a directed graph is unacceptable for the Combinatorial Laplacian matrix, unless using some approaches. Fourth, because I found using MAPE would show weird results, so I used WMAPE.

Hello,

Many thanks for the prompt reply and the slides' sharing. It's much clearer now. Shamefully for missing the important detail.

It would be great to release the script for generating the adjacency matrix, as people usually start from the raw data file "sensor_distance/sensor_ids" for different datasets.

Besides, the weird results using MAPE are most probably caused by the missing values (set to 0 by default). Previously I tested the masked_mape after obtaining the weird results using MAPE, which addressed the problem.

Actually, 0 value data is not missing data. Because the data represents the vehicle speed. 0 means no vehicles in this time step. That's why I used WMAPE.

JingweiZuo commented 3 years ago

First, I used the dataset that the DCRNN author provided, and I put it in here: https://github.com/hazdzz/Road_Traffic_Flow_datasets_from_DCRNN Second, I used the data preprocessing method that the ChebyNet author provided. Third, the dataset preprocessing method by DCRNN is not suitable for STGCN, because STGCN using ChebyNet or GCN as GNN backbone, it is unacceptable for directed graphs though the STGCN paper did not mention it. In graph signal processing theory, a directed graph is unacceptable for the Combinatorial Laplacian matrix, unless using some approaches. Fourth, because I found using MAPE would show weird results, so I used WMAPE.

Hello, Many thanks for the prompt reply and the slides' sharing. It's much clearer now. Shamefully for missing the important detail. It would be great to release the script for generating the adjacency matrix, as people usually start from the raw data file "sensor_distance/sensor_ids" for different datasets. Besides, the weird results using MAPE are most probably caused by the missing values (set to 0 by default). Previously I tested the masked_mape after obtaining the weird results using MAPE, which addressed the problem.

Actually, 0 value data is not missing data. Because the data represents the vehicle speed. 0 means no vehicles in this time step. That's why I used WMAPE.

The missing data is interpreted differently.

In practice, the 5-min interval is too small so that it's quite normal having no vehicles passed. However, as the model tends to forecast the global traffic tendency, it is reasonable to exclude the random missing values (i.e., 0) from the training process.

That's what I thought, thanks again for sharing the great work. :)

Best,

hazdzz commented 3 years ago

First, I used the dataset that the DCRNN author provided, and I put it in here: https://github.com/hazdzz/Road_Traffic_Flow_datasets_from_DCRNN Second, I used the data preprocessing method that the ChebyNet author provided. Third, the dataset preprocessing method by DCRNN is not suitable for STGCN, because STGCN using ChebyNet or GCN as GNN backbone, it is unacceptable for directed graphs though the STGCN paper did not mention it. In graph signal processing theory, a directed graph is unacceptable for the Combinatorial Laplacian matrix, unless using some approaches. Fourth, because I found using MAPE would show weird results, so I used WMAPE.

Hello, Many thanks for the prompt reply and the slides' sharing. It's much clearer now. Shamefully for missing the important detail. It would be great to release the script for generating the adjacency matrix, as people usually start from the raw data file "sensor_distance/sensor_ids" for different datasets. Besides, the weird results using MAPE are most probably caused by the missing values (set to 0 by default). Previously I tested the masked_mape after obtaining the weird results using MAPE, which addressed the problem.

Actually, 0 value data is not missing data. Because the data represents the vehicle speed. 0 means no vehicles in this time step. That's why I used WMAPE.

The missing data is interpreted differently.

In practice, the 5-min interval is too small so that it's quite normal having no vehicles passed. However, as the model tends to forecast the global traffic tendency, it is reasonable to exclude the random missing values (i.e., 0) from the training process.

That's what I thought, thanks again for sharing the great work. :)

Best,

Thanks for your remind. If you think 0 value is missing data, then it is beneficial for model, because the model would be robust literally means over-fitting would not happen.

Suasy commented 6 months ago

Is it because the label y is also done as a scaler so that some y are very small or negative, causing the MAPE to come out weird?

Suasy commented 6 months ago

This is certainly the case. There should be no scaler for label y, otherwise y can get really weird. {GYY2_OW7U6YS_4SDS1CHJF

Maochoulou commented 4 months ago

Is it because the label y is also done as a scaler so that some y are very small or negative, causing the MAPE to come out weird?

您好,请问最后有解决这个问题嘛。谢谢谢谢

hazdzz commented 4 months ago

Is it because the label y is also done as a scaler so that some y are very small or negative, causing the MAPE to come out weird?

您好,请问最后有解决这个问题嘛。谢谢谢谢

你好。你如果有興趣,可以嘗試解決這個 issue。

Suasy commented 4 months ago

Is it because the label y is also done as a scaler so that some y are very small or negative, causing the MAPE to come out weird?

您好,请问最后有解决这个问题嘛。谢谢谢谢

我解决了这个问题,在原来的代码里面对训练的ground true y_true 进行了 transform,然后在计算metric的时候对y_true inverse_transform,这样显然是不符合实际的,会改变 y_true 的实际值,(例如 0 经过这一系列操作会变成 0.000001 或者一个很小的负数,导致求指标的时候出现问题)。所以我将其改成了常见的操作,即对ground true y_true 保持原样,在load data和计算metric的时候都不对 y_true 进行处理,同时在验证时对 y_pred 进行 inverse_transform 然后再和 y_true 求指标。求指标的时候也采用常用的masked策略,对特殊的0进行mask或者其他处理,这样除数不会为0,参考GraphWaveNet对应的求指标的代码

综上所述,我更改了三处地方,分别是dataloader.py对应load data的地方,以及utility.py计算指标的地方,和main.py里面进行预测和验证的地方。我把这些修改后的代码上传如下。如果有歧义的地方欢迎交流🤗 code.zip

hazdzz commented 4 months ago

我解决了这个问题,在原来的代码里面对训练的ground true y_true 进行了 transform,然后在计算metric的时候对y_true inverse_transform,这样显然是不符合实际的,会改变 y_true 的实际值,(例如 0 经过这一系列操作会变成 0.000001 或者一个很小的负数,导致求指标的时候出现问题)。所有我将其改成了常见的操作,即对ground true y_true 保持原样,在计算metric的时候不对 y_true 进行处理,而是对 y_pred 进行 inverse_transform 然后再和 y_true 求指标。求指标的时候也采用常用的masked指标,对特殊的0进行mask或者其他处理,这样除数不会为0,参考GraphWaveNet对应的求指标的代码

综上所述,我更改了三处地方,分别是dataloader.py对应load data的地方,以及utility.py计算指标的地方,和main.py里面进行预测和验证的地方。我把这些修改后的代码上传如下。如果有歧义的地方欢迎交流🤗 code.zip

非常感謝!