nabeel-oz / qlik-py-tools

Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).
https://nabeel-oz.github.io/qlik-py-tools/
MIT License
184 stars 87 forks source link

Error in the Dataload Script #83

Closed Odassier closed 4 years ago

Odassier commented 4 years ago

Function Data Load

Describe the bug Data could not be loaded because of an error in the dataload script

Error message in Qlik Provide a screenshot of the error message in Qlik. image

Additional context Simply modified the script for my data file following same format as in the Sample App (Sample forecasting with Keras). File attached here. Not sure what I've missed. Tried to reuse as much of the original code as possible. Axioma-ROOF-Forecasting-with-Keras.zip

Odassier commented 4 years ago

I've made a few more changes to the load script to try again but same error. I's also attaching my data model if it will help identify the error. Basically I'm trying to use your template to evaluate of the ten metrics (features) highlighted in yellow can help me predict the index (target) value highlighted in pink. But more generally, I'd like to build a generic template for up to ten features to forecast one target, so that I can simple copy the app and load a different set of data via excel (i.e. the columns headers won't change). US-LMS.xlsx Axioma-ROOF-Forecasting-with-Keras.zip

nabeel-oz commented 4 years ago

Hi @Odassier , can you please post the full error log from the SSE terminal? I won't have the time to set up your app to test it for a week or two.

Also, did you go through the usage doc on forecasting with Keras? There are several parameters that you will need to configure for the process to work correctly.

Odassier commented 4 years ago

Hi Nabeel-Oz/Qlik-Py-Tools,

Happy New Year to you by the way.

Yes, I did read the usage.doc. I guess I struggle (newbie to ML) with some of the parameter decisions in the load scrip. Specifically in the ML Config section, I get the "variable type" (Feature vs Target), but I' m unsure about the data type for example. My data is all numbers so I know it;'s not "str" but unclear about the difference between "float" and "Int" or even what "bool" might be used for.

Also in that section, the "Feature Strategy" which I am labeling as "scaling" bu my 'features' are actually already standardized and percentile-rank scores so they should not need scaling ?

I also have one target ("R1000") but ten features (column D-M in the attached). Basically these are ten metrics we use to predict the direction of markets, in this case the Russell 1000 index. So I'm trying to use your template in an app to see if Keras comes up with a similar forecast for the next few days as we do.

What could be helpful is if you point me to the right places in your load script where I should make modifications. I've tried to identify those on my own but this has led to this data load error (i.e. I can't even get my data into your app). Understand that you are busy, this is not urgent. If you send me a few hints I'll keep trying, but two weeks is fine for you to try and fix the app itself. I'm attaching the latest iteration and the data file I'm using.

Thank you so much,

Olivier PS: the next app of your I want to try is the Basket app. Since we have the factor returns for our models, they can act as a sort of grocery itemized bills, and I would like to use the app to determine which factors investors buy and sell together as a way to challenge our covariance matrix.

On Tue, Jan 7, 2020 at 8:03 AM Nabeel notifications@github.com wrote:

Hi @Odassier https://github.com/Odassier , can you please post the full error log from the SSE terminal? I won't have the time to set up your app to test it for a week or two.

Also, did you go through the usage doc https://github.com/nabeel-oz/qlik-py-tools/blob/master/docs/Keras.md on forecasting with Keras? There are several parameters that you will need to configure for the process to work correctly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nabeel-oz/qlik-py-tools/issues/83?email_source=notifications&email_token=AOBGIB4M53IR5IGYAMUVOCLQ4PBE7A5CNFSM4KB2IWRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIHGRHI#issuecomment-571369629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBGIB7AQJOPO6DTR3U4MH3Q4PBE7ANCNFSM4KB2IWRA .

Odassier commented 4 years ago

Hi Nabeel, I made some progress using the Prophet Multivariate forecasting app and changing the data in the day.csv file to use my data and for 2018-2019. data was loaded correctly, I made all the right changes in the load script and the first few sheets work fine, I'm getting good results. But the final two sheet show an Error. I'm not sure what to change in the load scipt for those two to work. It should use 2018 and 2019 data (and like in your example I'm ok reusing the 2019 values for the two selected regressors as 2020 forecasts) and show me a 2020 forecast but it does not. I'm attaching the app here and hope you know a quick fix. Thank you for your help with this. Olivier Axioma-ROOF-Prophet-Multivariate - Copy.zip

nabeel-oz commented 4 years ago

I ran your Keras app and it generates this error where the first line explains the problem. The error occurs at the point where the predictions are being generated for the US-LMS-CausalConv-7d-v1 model that generates 7 days of predictions at a time.

image

There is an If condition in the Qlik load script under the Metrics and Predictions section which you need to modify to your model name so that the model with multi-step predictions is handled correctly.

If '$(vModel)' <> 'Bikes-CausalConv-7d-v1' Then

Should be :

If '$(vModel)' <> 'US-LMS-CausalConv-7d-v1' Then

Another issue which doesn't generate an error is that you are passing the data types for your features as int which stands for integer, when your data is floating point numbers. You should use float in this case.

nabeel-oz commented 4 years ago

For the Prophet app I think you are having issues due to NULL values in your additional regressors:

image

Odassier commented 4 years ago

Hi Nabeel,

I get it. This is because in 2019 there was no trading on Dec 29th. is there a way to alter the code so that it uses the nearest date instead? For now. I will add the missing days carrying forward the previous trading days's data.

Olivier

On Tue, Jan 14, 2020 at 8:19 AM Nabeel notifications@github.com wrote:

For the Prophet app I think you are having issues due to NULL values in your additional regressors:

[image: image] https://user-images.githubusercontent.com/21298843/72302940-b0bc3800-36bf-11ea-9d50-4893e7d02e8f.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nabeel-oz/qlik-py-tools/issues/83?email_source=notifications&email_token=AOBGIB2NL3UNVKVMXAG4QE3Q5UAJXA5CNFSM4KB2IWRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI2ZSUQ#issuecomment-573938002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBGIB3RAZPGZZQSJROIXFDQ5UAJXANCNFSM4KB2IWRA .

Odassier commented 4 years ago

Axioma-ROOF-Prophet-Multivariate - Copy.zip Hi Nabeel,

Thanks for getting back to me on this. I added all the days in the new data files, simply using Monday's numbers for Saturday and Sunday, but is there a way to make it ignore the weekends? The last sheet works now (I think) although it shows a seasonality for Saturdays and Sundays. Burt the 'Final Model" sheet still gives me an error which I do not understand.

"TypeError: ("object of type 'NoneType' has no len()", 'occurred at index regressor_1')"

I'm attaching the revised app here. Any ideas on what is tripping it now?

Cheers,

Olivier

On Tue, Jan 14, 2020 at 8:19 AM Nabeel notifications@github.com wrote:

For the Prophet app I think you are having issues due to NULL values in your additional regressors:

[image: image] https://user-images.githubusercontent.com/21298843/72302940-b0bc3800-36bf-11ea-9d50-4893e7d02e8f.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nabeel-oz/qlik-py-tools/issues/83?email_source=notifications&email_token=AOBGIB2NL3UNVKVMXAG4QE3Q5UAJXA5CNFSM4KB2IWRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI2ZSUQ#issuecomment-573938002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBGIB3RAZPGZZQSJROIXFDQ5UAJXANCNFSM4KB2IWRA .

Odassier commented 4 years ago

Hi Nabeel,

I corrected the load script under the Metrics and Predictions section as you pointed out, thanks for that.

I'm not sure what the first error means as there is ample data in the table (see attached) so I'm not sure how to fix that, any thoughts?

here is the error message I get now when I try to load the data (still haven't gotten past that stage yet).

image

Any thoughts?

thanks for your help.

Olivier

On Tue, Jan 14, 2020 at 7:43 AM Nabeel notifications@github.com wrote:

I ran your Keras app and it generates this error where the first line explains the problem. The error occurs at the point where the predictions are being generated for the US-LMS-CausalConv-7d-v1 model that generates 7 days of predictions at a time.

[image: image] https://user-images.githubusercontent.com/21298843/72300148-0a6c3480-36b7-11ea-8118-a6dd9aedd2ef.png

There is an If condition in the Qlik load script under the Metrics and Predictions section which you need to modify to your model name so that the model with multi-step predictions is handled correctly.

If '$(vModel)' <> 'Bikes-CausalConv-7d-v1' Then

Should be :

If '$(vModel)' <> 'US-LMS-CausalConv-7d-v1' Then

Another issue which doesn't generate an error is that you are passing the data types for your features as int which stands for integer, when your data is floating point numbers. You should use float in this case.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nabeel-oz/qlik-py-tools/issues/83?email_source=notifications&email_token=AOBGIBYFXQUF26PCXPP63RTQ5T4CPA5CNFSM4KB2IWRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI2XC2Q#issuecomment-573927786, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBGIB6VIZ6QG6EIHDRT2YLQ5T4CPANCNFSM4KB2IWRA .

Odassier commented 4 years ago

Hi Nabeel, Any chance you can have a look at these two issues today? I kind of need to get this project done by early next week. Sorry. Olivier

nabeel-oz commented 4 years ago

Hi Olivier,

For the Prophet app, it looks like you still have a NULL value in your ADDITIONAL_REGRESSORS field.

image

To exclude weekends you can simply skip sending the dates to Prophet. For example the dimension and measure below will exclude weekends. This is just a quick example, you should avoid generating unwanted dates in the forecast calendar in the load script.

// Dimension:
if(MixMatch(Weekday(FORECAST_DATE), 'Sat', 'Sun') = 0, FORECAST_DATE)

// Measure:
PyTools.Prophet(if(MixMatch(Weekday(FORECAST_DATE), 'Sat', 'Sun') = 0, FORECAST_DATE), Sum({$<FORECAST_LINK_TYPE = {'Actual'}>} [Count]), 
'freq=D, return=yhat, changepoint_prior_scale=$(vFit), changepoint_range=$(vChangePointRange), n_changepoints=$(vChangePoints), debug=true')

For the Keras app, I could complete the load with just the changes I've mentioned above, and using the Excel data source you provided rather than the CSV it was set up to use. Note that I've set the epochs to 1 for testing.

_Issue 83 1 Keras.zip

Odassier commented 4 years ago

Hi Nabeel, thank you so much for getting back to me. I added the missing date (Jan 01 2019) into the data file. But then I encoutered a spike in the Test data on 2019-02-01 which became 800+ instead of 200+. I solved it by changing the measures to avg instead of sum.. But the final model is still not working. giving me the error below:

File "C:\Users\odassier\Documents\Qlik\Sense\Extensions\qlik-py-tools-master\qlik-py-env\core_utils.py", line 401, in atof if len(a) == 0: TypeError: ("object of type 'NoneType' has no len()", 'occurred at index regressor_1')

Also, any idea why I am getting this jump on the first day of the forecast in this chart? Chart

thanks.

Olivier

nabeel-oz commented 4 years ago

The issues you're having are symptoms of incorrect inputs. So just try validating your inputs using a table in Qlik and identify the root cause.

The jump in numbers may be duplicate records.

The error is indicating that your second additional regressor still has Null values for some period(s).

Both problems can be investigated in Qlik by throwing relevant dimensions and measures into a table.

Odassier commented 4 years ago

Thanks. That's what I did. I find values for all the measures for all the dates in the Qlik table, but I'll keep digging. I will also try to reload the data without the weekends which is probably where the duplicate values come from.

Is there a way to "avoid generating unwanted dates in the forecast calendar in the load script"?

Odassier commented 4 years ago

I updated the Keras app. The data loads correctly as you did, thanks. But I'm getting strange results (bad) even though the model has skill. I switched the epoch to 50 and the batch size to 20 (takes about 5 minutes to load). Take a look at the results in sheets 2 & 3. Why are my r2 scores so high! The target has a range of 170 to 240 so why are my predictions in the thousands?! I could not get sheets 4 & 5 to work, error is "ValueError: 12 columns passed, passed data had 10 columns". I'm using the same US-LMS.xls data file that I had sent you, just with updated data. Also, my data is in the form of percentile rank, so should I still use "scaling" as a feature strategy or should I say "none"?

Can you have a look at my app and let me know how to fix that? Also let me know what you think of these results, I'm not sure they are usable now. thank you so much for your help.

Olivier US-LMS.xlsx

Axioma-ROOF-Forecasting-with-Keras.zip

nabeel-oz commented 4 years ago

I can't take a look at your app at the moment, but there was a bug fix in the latest release for the R2 score when using keras and scaling the target.

The error on the incorrect number of columns means you are not passing the same number of features you used during training. Remember you do need to pass the target for predictions as well if using lag periods. The target can be null for the periods for which you want to generate predictions.

Odassier commented 4 years ago

Hi Nabeel,

no worries, thanks for taking a look when you have time.

"you are not passing the same number of features you used during training", this is what I do not understand as I am reusing your template (just changed the data) and my understanding was that it would reuse the prior year's inputs as next year forecasts (from the load script) so I'm not sure why it says I do not have the same number of features as in the training (if these are the "additional regressors"?

I also thought the forecasting calendar (1 year ahead) was created in the load script already right?

Cheers,

Olivier

Odassier commented 4 years ago

Hi Nabeel, On the Prophet Multivariate app, I think I figured out what the problem is.

the error "TypeError: ("object of type 'NoneType' has no len()", 'occurred at index regressor_1')" seems to come from the fact that the load script is not loading any values for the first day of the forecast calendar (i.e. 2020/01/01). But the data for 2019/01/01 for all four additional regressors is in the data file so I'm not sure why it is not creating an additional regressor values for that date. Any ideas?

The screenshot below shows the additional regressor data using the forecast date which is blank on Jan 1 2020, but the table below shows the data in the file with values for the four regressors on Jan 1 2019 (since we are asking it to use the 2019 values as 2020 forecasts for the regressors). 20200101_missing_additional_regressors

Not sure how to fix that. It seems to be the thing that trips it.

Odassier commented 4 years ago

ok, I think I found the error for the data. In the load script, in the "Forecast" section, the line " Date(ds + 365, 'YYYY-MM-DD') as ds," said "366". I changed it to 365 and now the Additional Regressors has values for Jan 1 2020 (see below).

20200101_missing_additional_regressors2

But I'm still unable to get the final model to work. Still gives me the error: "TypeError: ("object of type 'NoneType' has no len()", 'occurred at index regressor_1')"

nabeel-oz commented 4 years ago

Ah, the sample app is using 2012 data, which was a leap year, so it's adding 366 instead of 365 in that expression.

The error still indicates a NULL value in your second regressor. Are you sure that there are no other rows which have the issue? 2020 is a leap year so perhaps the 29th of February is the problem.

Odassier commented 4 years ago

HI Nabeel,

Yes, good catch, obviously there is no data for Feb 29 2019 to use as forecast for Feb 29 2020. How do I remedy that? I can't add a row in the data file for 29th of Feb 2019 since that date didn't exist right? How do I tell prophet to use the same estimate for two days in a row?

Olivier

nabeel-oz commented 4 years ago

Hi Olivier,

I would use an conditional expression when populating the additional regressors for the future periods. The expression would be to use last year's value if available, otherwise use the previous row's value. This should be possible with Qlik's peek function.

Odassier commented 4 years ago

not familiar with that one.

Odassier commented 4 years ago

Sorry, unable to figure this one out by myself. Any help on the attached app would be great;y appreciated. Does the new "peek" function go into the load script in the "Forecast Calendar" section, like somewhere in here: "LOAD Date($(vCalendarStartDate) + IterNo() - 1) as FORECAST_DATE AutoGenerate 1 While $(vCalendarStartDate) + IterNo() - 1 <= $(vForecastEndDate);

or in here:

"LOAD Date(ds) as FORECAST_DATE, added_regressors as ADDITIONAL_REGRESSORS RESIDENT temp;"

or as part of the chart expression in the "Final Model" measure like here:

PyTools.Prophet_Multivariate(FORECAST_DATE, Sum({$<FORECAST_LINK_TYPE = {'Actual'}>} [Count]), '', ADDITIONAL_REGRESSORS, '', 'freq=D, return=$(vShowFit), changepoint_prior_scale=$(vFit), changepoint_range=$(vChangePointRange), n_changepoints=$(vChangePoints), debug=true')

thanks for the direction. I'm attaching the latest app for testing in case you need it.

Cheers,

Olivier

Axioma-ROOF-Prophet-Multivariate.zip

nabeel-oz commented 4 years ago

Peek is a script function: https://help.qlik.com/en-US/sense/November2019/Subsystems/Hub/Content/Sense_Hub/Scripting/InterRecordFunctions/Peek.htm

The forecast calendar should already have the 29th of Feb, as it is simply iterating by 1 day at a time to create the calendar. This is where you could add a where clause to exclude weekends like you mentioned earlier.

The part where you create the added_regressors would need to have the logic to handle the 29th of Feb. This is where the peek function would be useful.

Odassier commented 4 years ago

Hi Nabeel, thanks for this, but being new to the load script functions, I'm really not sure how to write this. Could I trouble you to edit the load script in the attached app from the previous message so that I may learn from your example? Apologies for the request, just not familiar with it enough to fix it myself. thank you so much. Olivier

nabeel-oz commented 4 years ago

Closing this issue as the original problem has been addressed, and we exchanged messages outside github.