nabeel-oz / qlik-py-tools

Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).
https://nabeel-oz.github.io/qlik-py-tools/
MIT License
186 stars 87 forks source link

ValueError Found Infinity in Column Y #55

Closed Kev-kutkin closed 4 years ago

Kev-kutkin commented 4 years ago

Dear @nabeel-oz , I have adopted the workflow of your sample_app_forecasting_simple,qvf. However, for this precalculated method, I do not know which part of your scripting controls the forecast period required, for example 6 months or 12 months, or 24 months, as compared to your another calendar method. For this precalculated method, I have three questions. (1) In the LOAD script, is the field PERIOD or the [MONTH START] responsible for the last month to be forecasted? (2) Are we using the chart expression to indicate the starting month to be forecasted instead of the LOAD script? (3) Can we forecast Attendance values for months in year 2020? If so, how can we do that in the scripting? Part one: Besides these two questions, I have also encountered several problems that I am so eager to share with you and other users. Please advise me to solve it. What I have done so far: I have prepared my LOAD script and obtained the data model as shown below. It is not as what we expect actually. There is no link key available. data_model_1

Part two: Part two is based on my own data and not the hospital attendance dataset. What happened is that I have encountered error as shown below. field_error_qs

The qlik sense cannot proceed with the error above and the Command-line also generated the same error as shown below. forecast_error_output

@nabeel-oz , do we need to change the code block for the Prophet to bypass this error or can we reply on the scripting in Qlik Sense to solve this infinity error? I have no experience in dealing with infinity issue on the response variable.

Please advise me for these questions in the two parts above. Thank you for your time.

nabeel-oz commented 4 years ago

@kutkin54 I'll address your questions on pre-calculated forecasts in this post. In the sample app the future periods have been created in the source file (available here):

image

You can also create them in the Qlik load script using standard master calendar techniques. For e.g. for the sample app we can create additional periods for each hospital using the script below:

// Get the last month for which we have actual data
TempMax:
LOAD
     Max([Month Start]) as MaxActual
RESIDENT Sheet1;

// Set up variables for the start and end forecast month
LET vForecastStart = Peek('MaxActual', 0, 'TempMax');
LET vForecastEnd = AddMonths($(vForecastStart), 6);

Drop table TempMax;

// Add future months for each hospital
FOR EACH vHospital in FieldValueList('Hospital')

    // Generate future periods within the range and concatenate to the Sheet1 table
    Concatenate(Sheet1)
    LOAD
        '$(vHospital)' as Hospital,
        Num(AddMonths($(vForecastStart),IterNo())) as Period,
        Date(AddMonths($(vForecastStart),IterNo()), 'DD/MM/YY') as [Month Start],
        Null() as Attendances
    AUTOGENERATE(1) 
    WHILE AddMonths($(vForecastStart),IterNo()) <= Date#('$(vForecastEnd)');

Next vHospital

Hope that explains it, but to address your three questions:

(1) In the LOAD script, is the field PERIOD or the [MONTH START] responsible for the last month to be forecasted?

We use [Month Start] as the date field sent to Prophet. You can see this in the function call to the SSE.

(2) Are we using the chart expression to indicate the starting month to be forecasted instead of the LOAD script?

The sample app shows two techniques. The chart expressions show how the forecast can be done entirely in the frontend. The load script shows how the forecast can be done entirely during the load process. In either case, when using return=yhat Prophet returns the forecast for all periods, not just the future periods. This can be used to assess how well the model fits to known data. If you just want to return the forecast for future periods you can pass the parameter return=y_then_yhat as explained here.

(3) Can we forecast Attendance values for months in year 2020? If so, how can we do that in the scripting?

You can add additional forecast periods to the source, or generate the periods using the load script using the snippet above.

nabeel-oz commented 4 years ago

This post addresses the questions on your app.

What I have done so far: I have prepared my LOAD script and obtained the data model as shown below. It is not as what we expect actually. There is no link key available.

The screenshot you posted shows that the temp table still exists in your data model. In the sample app the temp table is used as input to the Prophet function and then dropped. The response from Prophet can be linked to your data model as demonstrated in the sample app. The link would be a concatenated key of the MonthYear, Customer Name, Products, etc. depending on the granularity of your forecast.

Part two is based on my own data and not the hospital attendance dataset. What happened is that I have encountered error as shown below

The error occurs because you have zero values in y and the take_log parameter set to true. A logarithm of zero results in negative infinity, which is not accepted by Prophet. You should use the default take_log=false option in this case.

Kev-kutkin commented 4 years ago

@kutkin54 I'll address your questions on pre-calculated forecasts in this post. In the sample app the future periods have been created in the source file (available here):

image

You can also create them in the Qlik load script using standard master calendar techniques. For e.g. for the sample app we can create additional periods for each hospital using the script below:

// Get the last month for which we have actual data
TempMax:
LOAD
     Max([Month Start]) as MaxActual
RESIDENT Sheet1;

// Set up variables for the start and end forecast month
LET vForecastStart = Peek('MaxActual', 0, 'TempMax');
LET vForecastEnd = AddMonths($(vForecastStart), 6);

Drop table TempMax;

// Add future months for each hospital
FOR EACH vHospital in FieldValueList('Hospital')

    // Generate future periods within the range and concatenate to the Sheet1 table
    Concatenate(Sheet1)
    LOAD
      '$(vHospital)' as Hospital,
        Num(AddMonths($(vForecastStart),IterNo())) as Period,
      Date(AddMonths($(vForecastStart),IterNo()), 'DD/MM/YY') as [Month Start],
        Null() as Attendances
    AUTOGENERATE(1) 
    WHILE AddMonths($(vForecastStart),IterNo()) <= Date#('$(vForecastEnd)');

Next vHospital

Hope that explains it, but to address your three questions:

(1) In the LOAD script, is the field PERIOD or the [MONTH START] responsible for the last month to be forecasted?

We use [Month Start] as the date field sent to Prophet. You can see this in the function call to the SSE.

(2) Are we using the chart expression to indicate the starting month to be forecasted instead of the LOAD script?

The sample app shows two techniques. The chart expressions show how the forecast can be done entirely in the frontend. The load script shows how the forecast can be done entirely during the load process. In either case, when using return=yhat Prophet returns the forecast for all periods, not just the future periods. This can be used to assess how well the model fits to known data. If you just want to return the forecast for future periods you can pass the parameter return=y_then_yhat as explained here.

(3) Can we forecast Attendance values for months in year 2020? If so, how can we do that in the scripting?

You can add additional forecast periods to the source, or generate the periods using the load script using the snippet above.

Thank you for your reply, @nabeel-oz . For the first method, Addition of forecast periods to the source, do you mean manually inserting the header for the Date variable (in any format, like 'YYYY-MMM' or '2020-Jan') as dimension and their corresponding measure will be subjected to zero and not N/A as values? What exactly should we use? Sorry, please explain more on what should we put as the corresponding measure within the source file (like .txt, or .csv, or .excel) because we do not have any data on it, so we would want to forecast it, @nabeel-oz .

nabeel-oz commented 4 years ago

Hi @kutkin54 , The value of the measure for future periods should be NULL.

In the sample app's source I have set the measure to 'N/A' for future periods as you can see above. This equates to NULL, as in Qlik the numerical value of a string which cannot be converted to a number is NULL. Blanks also translate to NULLs.

The dimension needs to be a date. The display format doesn't matter as long as it is recognized by Qlik as a date.

I recommend you read the documentation in detail as the usage of the Prophet functions is explained clearly there. The sample apps are just examples to supplement the documentation. https://github.com/nabeel-oz/qlik-py-tools/blob/master/docs/Prophet.md image

Kev-kutkin commented 4 years ago

@nabeel-oz , thank you for your reply.

I have followed your advice and now encounter new error. I have reached the part where the Prophet model should pass the date variable back to the Qlik data model. However, this step is not achievable by the SSE and the output can be found as below.

ds_field_error

but the Terminal returns output that seems normal by displaying the screen below.

ds_field_error_2

After exploring other Issues, I have make sure that I have installed the latest SSE by downloading the qlik-py-tools-master (in version 5.1) and then executed the corresponding Qlik-Py-Init bat file to reinstall all the relevant Python libraries. Unfortunately, the same outputs as shown above are obtained again.

I provide you a view of my LOAD script below for your reference. However, I have not manually inserted new Period as new fieldvalue for the my forecast task, and am not using the master calendar method. Please advise me for any solution for my case.

normal_script

nabeel-oz commented 4 years ago

You can trace what is happening by passing the debug=true parameter in your arguments. You probably have a product for which the forecast cannot be generated for some reason.

nabeel-oz commented 4 years ago

@kutkin54 , were you able to get past the issue? I suspect you had a NULL row in your Products field or some other invalid value for which a forecast could not be generated.

nabeel-oz commented 4 years ago

Closing this issue due to inactivity.