Closed slavavs82 closed 2 years ago
Hi, K
is the time dimension. In the case of the data challenge (see #2 ), we work on full months with hourly data, so K = 24 * 28 = 672
. In the last examples on this repo, I've switch to weekly predictions, so K = 24 * 7 = 168
.
Thank you so much for the answer! I'll clarify, as my English prevents me from understanding correctly. K is the window size? d_input is the data size inside the window?
No, K
is the time dimension, for example the number of hours in X
. d_input
is the dimension of the input X
for each time step. For instance, if you had 2 inputs (say outdoor temperature and ac schedule) for an entire week (168 hours), X
shape would be (1, 168, 2)
.
We refer to the window size as "attention_size"
, see here.
Fantastic, but I still don't get it :) My data set is very simple (for the test). series = np.sin(np.arange(0, 1000)) I'm breaking this list down into 30 size windows. I want to pass a window sized 30 to the network and predict the next 5 values. Let the Batch size be 1. What size will x have? What would equal a K?
In your case:
batch_size=1
: you only have one sampleK=1000
: 1000 time stepsd_input=1
: your sin is one dimensionalYou don't have to break the list into size windows, the transformer will do it for you using the score
, see this line of the MHA block.
Dude, I really appreciate it. Give me one last thing. How do I get a 30 size window online and predict the next 5? I'm thinking logically. I have a data set. I want to take some data and teach the network to predict the next data that the network can't see.
I think I misled you myself. series = np.sin(np.arange(0, 1000)) It's not a sample. It's a 1,000 size dataset. I want to break this dataset down into samples. Size sample = 30. That's what I called a window. Then I can combine these samples into batch=4 (for example). So, I have batch = 4, K = 30, d_input = 1. Is that so?
Ok I think I see where you're going. In that case, your values for batch_size
, K
and d_input
should be good. Unfortunately, the Transformer isn't well fit to predict future states, as it predicts one single output for each input. We discuss this in #5 .
In this case, there is a question. I submit sample=30(values) to the encoder input. If I want to predict the next 10 values, I submit these 10 values to the decoder input. I have seen this scheme in this document https://arxiv.org/pdf/2001.08317.pdf. Look at figure 1. Encoder input = T1,T2,T3,T4; decoder input T4,T5. In my case I want to feed Encoder input = T1,T2...T30; decoder input T30,T31...T40. Can I do that?
Of course, you can try it ?
Excuse me,could you please tell me is there any relevant paper about this code?I want to study it in depth.
Hi, I am trying to use a univariate time series dataset. I got this error:
KeyError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3079 try: -> 3080 return self._engine.get_loc(casted_key) 3081 except KeyError as err:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 521170
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17036/184118730.py in
~\anaconda3\lib\site-packages\torch\utils\data\dataloader.py in next(self) 519 if self._sampler_iter is None: 520 self._reset() --> 521 data = self._next_data() 522 self._num_yielded += 1 523 if self._dataset_kind == _DatasetKind.Iterable and \
~\anaconda3\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self) 559 def _next_data(self): 560 index = self._next_index() # may raise StopIteration --> 561 data = self._dataset_fetcher.fetch(index) # may raise StopIteration 562 if self._pin_memory: 563 data = _utils.pin_memory.pin_memory(data)
~\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py in fetch(self, possibly_batched_index) 42 def fetch(self, possibly_batched_index): 43 if self.auto_collation: ---> 44 data = [self.dataset[idx] for idx in possibly_batched_index] 45 else: 46 data = self.dataset[possibly_batched_index]
~\anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py in
~\anaconda3\lib\site-packages\torch\utils\data\dataset.py in getitem(self, idx) 309 310 def getitem(self, idx): --> 311 return self.dataset[self.indices[idx]] 312 313 def len(self):
~\anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key) 3022 if self.columns.nlevels > 1: 3023 return self._getitem_multilevel(key) -> 3024 indexer = self.columns.get_loc(key) 3025 if is_integer(indexer): 3026 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3080 return self._engine.get_loc(casted_key) 3081 except KeyError as err: -> 3082 raise KeyError(key) from err 3083 3084 if tolerance is not None:
KeyError: 521170
I'd appreciate it if you let me know if your code is suitable for the univariate time series. And how to solve this error? I used the code in this link: https://github.com/maxjcohen/transformer
Thanks
Hi, this seems to be an issue with Pandas, as you can see in the last stack of the Traceback. Did you try to feed a pandas Dataframe directly to the trainer ? You most likely need to modify the dataloader class to match your dataset.
In the future, please open a new issue when discussing new/different problems. Thanks !
Sorry, I am new to GitHub. Do you have any idea how to modify the data loader class?
I'll answer on the new issue :p closing this one as there is no longer any activity.
Parameters: | x (Tensor) – torch.Tensor of shape (batch_size, K, d_input).
What is K?