nabeel-oz / qlik-py-tools

Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).
https://nabeel-oz.github.io/qlik-py-tools/
MIT License
186 stars 87 forks source link

Not getting prediction #4

Closed kolbrant closed 6 years ago

kolbrant commented 6 years ago

Hello!

Hoping you can help me sort this issue out: Was trying this implementation and got it workign with the sample databooks provided with predictions as it should as shown here. And the console (Qlik-Py-Start.bat) is putting out some form of output as shown here.

However, when I try do to it with my own data it doesn't put anything at all to the console and the linegraph in Qlik Sense Desktop (QSD) is just a straight line instead of looking like an actual line which is shown here.

I have used the same expression and variables as used in the samplebooks provided and changed the parameters to match the name of those in my datasource.

There must then be some kind of issue with the SSE connection to my workbook since nothing is happening with my prediction-line?

Best Regards

nabeel-oz commented 6 years ago

Hi @kolbrant, your line chart shows the actual values are being calculated as ~440M for the entire time series. This indicates that the problem lies with your expression or data model. If you're following the forecast calendar script from this project's documentation you need a bit of set analysis for both the actual and the prediction expressions.

Note the set analysis used in the expressions in the sample app: Count({$<FORECAST_LINK_TYPE = {'Actual'}>} Distinct ACCIDENT_NO)

You could have a simpler data model where this set analysis is not required. I've set up the model in this way so that the actual dates and forecast dates are two different fields, allowing two different set of selections. This lets us run the forecast on a subset of the actual time series e.g. from 2017-2018, yet still visualize the forecast across a longer period e.g. 2016-2019.

nabeel-oz commented 6 years ago

Closing this issue as it is a problem with the Qlik expression/data model rather than this SSE. You're welcome to comment further if you have a follow up question.

kolbrant commented 6 years ago

Hello again Nabeel! I have this current data model: link.

The only one from Forecast Calender I really need is _FORECASTYEAR since I only have a year value from the datascourse being a .xlsx file.

About it showing the sum of ~440M is when I'm using the following expression for the dimension:

=if(FORECAST_YEAR <= AddYears(Max(Total([År])), $(vForecastPeriods)), FORECAST_YEAR)

If I change this to only me [År] (showing all years from 1968 - 2017 I get a normal line chart showing the increase of population like this. At the end is then where I'd like the prediction-line to continue based on the previous data.

So my guess is that the expression is the thief in this situation, how do you think I have to write the expression to get the line as I'd like it? Or just to get the line in general (start at 1968) would also be much appriciated!

Best Regards, Oskar

nabeel-oz commented 6 years ago

Hi Oskar, I suspect you are missing the set analysis in the measure, so can you post your expression for the Actual and Forecast here as well?

However, if your data source is a simple Excel file, and you just want a quick result, you can simply add fillers to your data for the years that you want to forecast. Just fill them with blank or null values as shown for 2017/18 and 2018/19 in the example below, and the forecasting algorithm will understand that it needs to add in predictions for those years. The forecast calendar script, the resulting set analysis for measures, and the IF condition in the dimension won't be required with this simple approach.

sample data

I'll update the documentation so it's clear that the forecasting calendar, variables, etc. are optional and only required for a deeper user experience.

kolbrant commented 6 years ago

The expression for the Actual and Forecast are as follows - Actual: Sum([Invånare]) Forecast: PyTools.Prophet(if(FORECAST_YEAR <= AddYears(Max(Total [År]), $(vForecastPeriods)), FORECAST_YEAR), Count({$<FORECAST_LINK_TYPE = {'Actual'}>} Sum([Invånare])), 'freq=MS, return=yhat')

I will try with the example that you provided and see what happens. Given that the Forecast measure expression is somewhat correct for the desired outcome.

Most of the data comes from the .xlsx file, there is however another datasource with location polygon data, but that source does not affect the line-chart.

nabeel-oz commented 6 years ago

When using the Forecast Year as the dimension, you will need to add the set analysis to Actual as well. So it should be Actual: Sum({$<FORECAST_LINK_TYPE = {'Actual'}>} [Invånare]).

The second problem is the measure in the Forecast expression. This should be the same as your Actual expression, as that's what you want to use to make predictions. In my sample app the measure is based on a count but in your case it is a sum.

Finally, note that your data is at an yearly frequency, so you need to set that correctly in the arguments with freq=Y as explained here.

So Forecast:

PyTools.Prophet(if(FORECAST_YEAR <= AddYears(Max(Total [År]), $(vForecastPeriods)), FORECAST_YEAR), Sum({$<FORECAST_LINK_TYPE = {'Actual'}>} [Invånare]), 'freq=Y, return=yhat')

narayanankm commented 5 years ago

Getting the below error while running the installation Qlik-Py-Init.bat. @nabeel-oz Can you help me on how to fix it ?

ERROR: Complete output from command 'd:\development\adv_analyics\qlik-py-tools-3.9\qlik-py-env\scripts\python.exe' 'd:\development\adv_analyics\qlik-py-tools-3.9\qlik-py-env\lib\site-packages\pip' install --ignore-installed --no-user --prefix 'C:\Users\qlikuser\AppData\Local\Temp\pip-build-env-v7525snk\overlay' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel cython numpy: ERROR: Traceback (most recent call last): File "C:\Program Files (x86)\Python36-32\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Program Files (x86)\Python36-32\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "d:\development\adv_analyics\qlik-py-tools-3.9\qlik-py-env\lib\site-packages\pip__main.py", line 16, in from pip._internal import main as _main # isort:skip # noqa File "d:\development\adv_analyics\qlik-py-tools-3.9\qlik-py-env\lib\site-packages\pip_internal\init__.py", line 4, in import locale File "C:\Program Files (x86)\Python36-32\lib\locale.py", line 16, in import re File "C:\Program Files (x86)\Python36-32\lib\re.py", line 142, in class RegexFlag(enum.IntFlag): AttributeError: module 'enum' has no attribute 'IntFlag'

ERROR: Command "'d:\development\adv_analyics\qlik-py-tools-3.9\qlik-py-env\scripts\python.exe' 'd:\development\advanalyics\qlik-py-tools-3.9\qlik-py-env\lib\site-packages\pip' install --ignore-installed --no-user --prefix 'C:\Users\qlikuser\AppData\Local\Temp\pip-build-env-v7525snk\overlay' --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- setuptools wheel cython numpy" failed with error code 1 in None

nabeel-oz commented 5 years ago

Hi @narayanankm, that error seems to be with pip which is a very basic module in Python. You'll need to check that your environment is set up correctly according to the installation notes.

One possible problem is that you're running the 32 bit installation of Python. On a modern system you really should be using the 64 bit version.