Open Weidong725 opened 4 months ago
Is your data formatted as follows?
yy-MM-dd, point1, point2, ..., point96
This appears to be univariate data. You can convert it into the regular format:
yy-MM-dd HH:mm:ss, point1
yy-MM-dd HH:mm:ss, point2
...
You can use the following code to convert your dataset. This way, your data can fit into this framework's models without significant adjustments. You can try period_len=4
and set enc_in=1
. The latter indicates the number of variables in the dataset.
import pandas as pd
# Sample dataframe
data = pd.read_csv('your_data.csv', header=None)
# Assuming first column is date and rest are points
date_col = data.iloc[:, 0]
points_cols = data.iloc[:, 1:]
# Generating timestamps for each day assuming 15-minute intervals
timestamps = pd.date_range(start='00:00', periods=96, freq='15T').strftime('%H:%M:%S')
# Create new DataFrame to store the converted data
converted_data = pd.DataFrame()
# Iterate through each row (day) in the original data
for index, row in data.iterrows():
date = row[0]
for i in range(1, 97):
time = timestamps[i-1]
converted_data = converted_data.append({
'datetime': f"{date} {time}",
'value': row[i]
}, ignore_index=True)
# Save the converted data to a new CSV file
converted_data.to_csv('converted_data.csv', index=False)
You can try the above method and see if it works. If it doesn't, feel free to reach out for further assistance.
For 96 points of data a day, why not set the period_len to 96 instead of 4? Would this have any effect on the results of the experiment
As discussed in Appendix C.2. and shown in Table 9, in scenarios with very long periods, an appropriate sparse strategy can be more effective. For instance, in the case of the ETTm1 dataset with a same period of 96, resampling with too large a period results in very short subsequences with sparse connections, leading to underutilization of information. In such cases, setting the period length to [2-6], i.e., adopting a denser sparse strategy, can be beneficial. Therefore, we recommend setting period_len=4 here.
thank you very much
Thank you for your patient answers. While reading your paper, I also found that you mentioned in your paper that sparse can be used in conjunction with GRU or Transformer. Where do you place GRU or Transformer on the network?
Yes, the sparse technique can be combined with GRU or Transformer. Please refer to our response to Issue #8 , where we provide the implementation code on how to integrate the sparse technique with Transformer. The implementation for GRU is similar. If you need the code for that, we are happy to share it.
Yeah, I think I need GRU related codes. Can you provide them? Thanks a lot !
Of course. You can try the following implementation code. Use self.no_sparse
to control whether to apply the sparse technique.
class Model(nn.Module):
def __init__(self, configs):
super(Model, self).__init__()
# get parameters
self.seq_len = configs.seq_len
self.pred_len = configs.pred_len
self.enc_in = configs.enc_in
self.period_len = configs.period_len
self.model_type = configs.model_type
self.no_sparse = configs.no_sparse
# self.no_sparse = True
self.no_sparse = False
if self.no_sparse:
self.gru = nn.GRU(input_size=1, hidden_size=64, num_layers=1, bias=True, batch_first=True, bidirectional=False)
self.output = nn.Linear(64, self.pred_len)
else:
self.seg_num_x = self.seq_len // self.period_len
self.seg_num_y = self.pred_len // self.period_len
self.conv1d = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=1 + 2 * self.period_len//2,
stride=1, padding=self.period_len//2, padding_mode="zeros", bias=False)
self.gru = nn.GRU(input_size=1, hidden_size=64, num_layers=1, bias=True, batch_first=True, bidirectional=False)
self.output = nn.Linear(64, self.seg_num_y)
def forward(self, x):
batch_size = x.shape[0]
if self.no_sparse:
seq_mean = torch.mean(x, dim=1).unsqueeze(1)
x = (x - seq_mean)
x = x.permute(0, 2, 1).reshape(-1, self.seq_len, 1)
_, hn = self.gru(x)
y = self.output(hn).view(-1, self.enc_in, self.pred_len).permute(0, 2, 1)
y = y + seq_mean
else:
# normalization and permute b,s,c -> b,c,s
seq_mean = torch.mean(x, dim=1).unsqueeze(1)
x = (x - seq_mean).permute(0, 2, 1)
x = self.conv1d(x.reshape(-1, 1, self.seq_len)).reshape(-1, self.enc_in, self.seq_len) + x
# b,c,s -> bc,n,w -> bc,w,n -> bcw,n,1
x = x.reshape(-1, self.seg_num_x, self.period_len).permute(0, 2, 1).reshape(-1, self.seg_num_x, 1)
_, hn = self.gru(x)
y = self.output(hn).view(-1, self.period_len, self.seg_num_y) # bc, w, m
# bc,w,m -> bc,m,w -> b,c,s
y = y.permute(0, 2, 1).reshape(batch_size, self.enc_in, self.pred_len)
y = y.permute(0, 2, 1) + seq_mean
return y
I read your source code and found that the input data you have here is indexed "y-MM-dd HH:mm:ss" and n columns of data. If my input data, the index is just "y-MM-dd", that is, one line represents a day, and a day has 96 points in time. How can I find the right "period_len" and whether the network should make any adjustments to the way I enter data