nabeel-oz / qlik-py-tools

Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).
https://nabeel-oz.github.io/qlik-py-tools/
MIT License
185 stars 87 forks source link

Phyton Performance #120

Open pajo79 opened 3 years ago

pajo79 commented 3 years ago

Hi Nabeel,

Thanks for a great Project. The combination Qlik & Phyton really is powerful.

I'm specifically using the PyTools.Association_Rules. I experience very poor performance though on the Phyton Side. I'm on MS Windows 2019 Server 2,3 Ghz (8 Core Xeon), 56 GB RAM.

Even when using a data-set of < 1 000 000 transactions, computation takes >4 hours. Is there any tips on how to increase performance?

Thanks in advance, br Paul

nabeel-oz commented 3 years ago

Hi @pajo79 , apologies for the really late reply. The apriori algorithm used for association rules mining is quite processing intensive. The key parameter you can adjust is the min_support which sets the minimum frequency at which the antecedent and consequent items must appear together in the dataset to be considered for an association rule. You can try setting this above the default value of 0.5 to avoid calculations for items that appear together infrequently in the dataset.

The syntax to define additional parameters is described here: https://github.com/nabeel-oz/qlik-py-tools/blob/master/docs/Association-Rules.md#association-rules-analysis-with-efficient-apriori