usail-hkust / SSH-GNN

3 stars 0 forks source link

Question regarding the inconsistency between the code and the paper. #1

Open Wang-robots opened 2 months ago

Wang-robots commented 2 months ago

Dear Jindong Han,

I noticed in your code that self.idx_label and self.idx_unlabel represent 70% of the data (part A) and 30% of the data (part B) from 479 monitoring stations. However, I observed that your paper does not mention using air quality data from part B (specifically referring to PM2.5 and CO as mentioned in your paper). Yet, your code appears to use air quality data from part B, rather than only performing forward propagation with the split data from part A. Instead, it utilizes data from all stations.

For instance, in the file model.py within the SSL_GNN class, under the def forward function:

# Modeling region dependency
x_rt = torch.concat([x_out, X_batch[:, :, i, 1:8]], dim=-1)
x_rt = self.region_gcn(x_rt, supports[2])
jhanao-hkust commented 1 month ago

Thanks for your question. The tensor X_batch[:, :, i, 1:8] stores weather and time information, and only X_batch[:, :, i, 0] stores air quality index, which is calculated from the raw observations (i.e., PM2.5, PM10, O3, NO2, SO2, and CO). So we do not use air quality data from part B for message passing. Regarding the feature dimension of X_batch, you can refer to util.py line 142 for more details.