jlgleason / hts-constrained-embeddings

Replication material for "Forecasting Hierarchical Time Series with a Regularized Embedding Space," KDD MileTS 2020
MIT License
13 stars 4 forks source link

Where can I find the data set used in your paper #1

Closed StatMixedML closed 4 years ago

StatMixedML commented 4 years ago

Description

I am trying to replicate your paper results. However, the link to the data seems to hold a different data set you are using.

# # prepare data, create mappings of hierarchy that will be used for fitting/evaluation
data, hierarchy_agg_dict, hierarchy_level_dict = preprocess_tourism_data('../data/hier1_with_names.csv')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\programdata\anaconda3\envs\deephar\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Year'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-9-1d326ce65834> in <module>
      9 
     10 # # prepare data, create mappings of hierarchy that will be used for fitting/evaluation
---> 11 data, hierarchy_agg_dict, hierarchy_level_dict = preprocess_tourism_data('../data/hier1_with_names.csv')
     12 
     13 # # create train/val/test datasets

c:\programdata\anaconda3\envs\deephar\lib\site-packages\src\data.py in preprocess_tourism_data(datapath)
    285 
    286     tourism_df = pd.read_csv(datapath)
--> 287     tourism_df['Year'] = tourism_df['Year'].ffill().astype(int)
    288     tourism_df['Date'] = tourism_df['Month'] + ' ' + tourism_df['Year'].astype(str)
    289     tourism_df = tourism_df.drop(columns = ['Year', 'Month'])

c:\programdata\anaconda3\envs\deephar\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

c:\programdata\anaconda3\envs\deephar\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Year'

Reading the data from the link gives this

image

With these columns

image

It seems that the time variable is missing. Also, the series appear to be rather short. I would appreciate your help or sharing the data.

Thanks!