nabeel-oz / qlik-py-tools

Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).
https://nabeel-oz.github.io/qlik-py-tools/
MIT License
186 stars 87 forks source link

Getting different forecast in Qlik Sense with Prophet #78

Closed ghost closed 4 years ago

ghost commented 4 years ago

I'm trying to reproduce in Qlik Sense the forecasting example found in the Prophet docs where a multivariate regressor it's used but I'm getting different kind of forecast with less performance than applying the model direct in Python without the SSE funcionality.

To Reproduce

  1. Using bike rentals, holidays and weather conditions (temp, rain, sun, wind) datasets from this example: https://nbviewer.jupyter.org/github/nicolasfauchereau/Auckland_Cycling/blob/master/notebooks/Auckland_cycling_and_weather.ipynb
  2. Using the same parameters to the model
  3. Using the same period of time to train and test sets

Expected behavior Expecting to have the same forecasting behaivor since we are using the same parameters and random seed for the model

Screenshots 1.- Datasets

2.- Parameters

3.- Same periods - Different Results

Environment:

I've tried but can't found any possible source of the difference in both.

nabeel-oz commented 4 years ago

Hi @ijlorant ,

One difference I can spot is the mcmc_samples parameter which is set to 300 in your notebook but not specified when using PyTools which would default it to 0.

I'd also set the debug=true argument in the PyTools function and check the log in the terminal to confirm that the additional regressors are being interpreted correctly.

Also, when using the line chart with a continuous axis in Qlik Sense, some data points can be smoothed out until the user zooms in. So I would use a table and compare the exact values to be sure they match.

ghost commented 4 years ago

Hi @nabeel-oz

I've noticed that I didn't include the mcmc_samples parameters in my SS but indeed I did try with that parameter too and got almost the same result.

I think that it's not about the line chart because I've looked the forecasting values in a table and everything looks ok but looking to the log file it seems that the extra regressors are not included to fit the model.

Let me look if I can find why and in any case, I'll let you know what happened with the issue.

nabeel-oz commented 4 years ago

Yes, that confirms that the additional regressors are not getting passed correctly. Try plotting your vFeatures variable against the FORECAST_DATE dimension to see if that is looking ok.

The SSE will fall back to a basic model ignoring the additional regressors if it only finds one unique value (most likely NULL) in the column.

This is done at line 663 in _prophet.py:

        # Check if the regressors column is empty
        if len(self.regressors_df.regressors.unique()) == 1:
            # Return without further processing
            self.has_regressors = False
            return None
ghost commented 4 years ago

Hi @nabeel-oz

I've found the problem, it was a wrong data relation in QS between the extra regressors and the forecasting calendar. Now PyTool reproduce the same forecast than the notebook.

image

Maybe I'll ask for your help soon in another case related to Keras but for now thanks for your help.

nabeel-oz commented 4 years ago

Great, glad you've got it working.