gramener / gramex

A visual analytics platform to build data-based web apps with less code.
https://gramener.com/gramex/guide/
Other
141 stars 56 forks source link

GRAMEX-182 ⁃ MLHandler TimeSeries throws a KeyError #527

Open sanand0 opened 2 years ago

sanand0 commented 2 years ago

I created this gramex.yaml

url:
  mlhandler/forecast:
    pattern: /$YAMLURL/forecast
    handler: MLHandler
    kwargs:
      data:
        url: $YAMLPATH/inflation.csv  # Inflation dataset
      model:
        index_col: index    # Use index column as timestamps
        target_col: R
        class: SARIMAX
        params:
          order: [7, 1, 0]  # Creates ARIMA estimator with (p,d,q)=(7,1,0)
                            # Add other parameters similarly

and copied this inflation.csv.

When I ran Gramex and clicked on template's "Predict" button...

image

I got an error with this log:

INFO    19-Apr 07:57:08 __init__ PORT Gramex 1.78.0 | D:\temp\ts | Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit 
(AMD64)]
DEBUG   19-Apr 07:57:08 config PORT Loading config: d:\site\gramener.com\viz\async-gramex\gramex\gramex.yaml
DEBUG   19-Apr 07:57:08 config PORT Loading config: D:\temp\ts\gramex.yaml
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: version
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: mime   
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: threadpool
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: cache     
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: handlers  
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: log       
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: eventlog  
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: app
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: otp
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: schedule
INFO    19-Apr 07:57:08 __init__ PORT Initialising schedule:gramex_update
DEBUG   19-Apr 07:57:08 scheduler PORT schedule:gramex_update: Next run in 57771.3s
DEBUG   19-Apr 07:57:08 __init__ PORT Loading service: url
DEBUG   19-Apr 07:57:08 __init__ PORT url:mlhandler/forecast (MLHandler)
DEBUG   19-Apr 07:57:08 __init__ PORT Gramex update ran recently. Deferring check. 
DEBUG   19-Apr 07:57:09 config PORT Loading config: d:\site\gramener.com\viz\async-gramex\gramex\apps.yaml 
DEBUG   19-Apr 07:57:09 config PORT Loading config: C:\Users\anand\AppData\Local\Gramex Data\apps\apps.yaml
DEBUG   19-Apr 07:57:09 config PORT Loading config: d:\site\gramener.com\viz\async-gramex\gramex\handlers\openapiconfig.yaml
DEBUG   19-Apr 07:57:09 cache PORT Flushing C:\Users\anand\AppData\Local\Gramex Data\apps\mlhandler\mlhandler-forecast\config.json
DEBUG   19-Apr 07:57:09 cache PORT Flushing C:\Users\anand\AppData\Local\Gramex Data\apps\mlhandler\mlhandler-forecast\config.json
d:\site\gramener.com\viz\async-gramex\gramex\ml_api.py:230: UserWarning: Model changed, removing old parameters.
  warnings.warn("Model changed, removing old parameters.")
DEBUG   19-Apr 07:57:09 cache PORT Flushing C:\Users\anand\AppData\Local\Gramex Data\apps\mlhandler\mlhandler-forecast\config.json
DEBUG   19-Apr 07:57:09 cache PORT Flushing C:\Users\anand\AppData\Local\Gramex Data\apps\mlhandler\mlhandler-forecast\config.json
DEBUG   19-Apr 07:57:10 __init__ PORT url:favicon (FileHandler) -90
DEBUG   19-Apr 07:57:10 __init__ PORT url:default (FileHandler) -100
DEBUG   19-Apr 07:57:10 __init__ PORT Running callback: app
INFO    19-Apr 07:57:10 __init__ PORT Listening on port 9988
INFO    19-Apr 07:57:10 __init__ 9988 <Ctrl-B> opens the browser. <Ctrl-D> starts the debugger.
INFO    19-Apr 07:57:15 __init__ 9988 200 GET / (127.0.0.1) 0.00ms default
INFO    19-Apr 07:57:15 __init__ 9988 200 GET /favicon.ico (127.0.0.1) 0.00ms favicon
INFO    19-Apr 07:57:25 __init__ 9988 200 GET / (127.0.0.1) 0.00ms default
INFO    19-Apr 07:57:25 __init__ 9988 200 GET /favicon.ico (127.0.0.1) 0.00ms favicon
INFO    19-Apr 07:57:30 __init__ 9988 200 GET /forecast (127.0.0.1) 10.07ms mlhandler/forecast
INFO    19-Apr 07:57:30 __init__ 9988 200 GET /forecast?_cache&_limit=5&_format=json&_meta=y (127.0.0.1) 0.00ms mlhandler/forecast
INFO    19-Apr 07:57:30 __init__ 9988 200 GET /forecast?_cache&_opts (127.0.0.1) 0.00ms mlhandler/forecast
ERROR   19-Apr 07:57:34 mlhandler 9988 'The `start` argument could not be matched to a location related to the index of the data.'
Traceback (most recent call last):
  File "D:\anaconda\3.7\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: ''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "d:\site\gramener.com\viz\async-gramex\gramex\handlers\mlhandler.py", line 191, in _predict
    target = data.pop(score_col)
  File "D:\anaconda\3.7\lib\site-packages\pandas\core\frame.py", line 5226, in pop
    return super().pop(item=item)
  File "D:\anaconda\3.7\lib\site-packages\pandas\core\generic.py", line 870, in pop
    result = self[item]
  File "D:\anaconda\3.7\lib\site-packages\pandas\core\frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "D:\anaconda\3.7\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: ''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pandas\_libs\index.pyx", line 460, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\anaconda\3.7\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 429, in pandas._libs.index.DatetimeEngine.get_loc
  File "pandas\_libs\index.pyx", line 462, in pandas._libs.index.DatetimeEngine.get_loc
KeyError: Timestamp('1970-01-01 00:00:00')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\anaconda\3.7\lib\site-packages\pandas\core\indexes\datetimes.py", line 703, in get_loc
    return Index.get_loc(self, key, method, tolerance)
  File "D:\anaconda\3.7\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: Timestamp('1970-01-01 00:00:00')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\anaconda\3.7\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 357, in get_prediction_index
    start, base_index, data.row_labels
  File "D:\anaconda\3.7\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 279, in get_index_label_loc
    raise e
  File "D:\anaconda\3.7\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 243, in get_index_label_loc
    loc, index, index_was_expanded = get_index_loc(key, index)
  File "D:\anaconda\3.7\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 176, in get_index_loc
    loc = index.get_loc(key)
  File "D:\anaconda\3.7\lib\site-packages\pandas\core\indexes\datetimes.py", line 705, in get_loc
    raise KeyError(orig_key) from err
KeyError: Timestamp('1970-01-01 00:00:00')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\site\gramener.com\viz\async-gramex\gramex\handlers\mlhandler.py", line 199, in _predict
    data = self.model.predict(data, target_col=tcol)
  File "d:\site\gramener.com\viz\async-gramex\gramex\sm_api.py", line 78, in predict
    return self.res.predict(start, end, exog=exog, **kwargs)
  File "D:\anaconda\3.7\lib\site-packages\statsmodels\base\wrapper.py", line 113, in wrapper
    obj = data.wrap_output(func(results, *args, **kwargs), how)
  File "D:\anaconda\3.7\lib\site-packages\statsmodels\tsa\statespace\mlemodel.py", line 3403, in predict
    prediction_results = self.get_prediction(start, end, dynamic, **kwargs)
  File "D:\anaconda\3.7\lib\site-packages\statsmodels\tsa\statespace\mlemodel.py", line 3287, in get_prediction
    self.model._get_prediction_index(start, end, index))
  File "D:\anaconda\3.7\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 843, in _get_prediction_index
    data=self.data,
  File "D:\anaconda\3.7\lib\site-packages\statsmodels\tsa\base\tsa_model.py", line 361, in get_prediction_index
    "The `start` argument could not be matched to a"
KeyError: 'The `start` argument could not be matched to a location related to the index of the data.'
INFO    19-Apr 07:57:34 __init__ 9988 200 GET /forecast?Dp=-0.00313258&index=1972-04-01 (127.0.0.1) 36.61ms mlhandler/forecast      

┆Issue is synchronized with this Jira Bug

jaidevd commented 2 years ago

@sanand0 The template does not yet support SARIMAX. Forecasting requires a different interface:

  1. Different kwargs from sklearn
  2. Different evaluation metrics from sklearn
  3. Prediction / forecasting needs a "start" and "end" timestamp + any exogenous data. This needs some work. There was a PR that did this but it's too stale to reuse directly.

Moreover, to do this properly, we shouldn't be conforming to the current template which so heavily favours sklearn. So we can create a new one. Either of us can come up with a mock, and I'll build it.