mhahsler / arules

Mining Association Rules and Frequent Itemsets with R
http://mhahsler.github.io/arules
GNU General Public License v3.0
194 stars 42 forks source link

Parameter to control number of rules generated in apriori #9

Open sjain777 opened 8 years ago

sjain777 commented 8 years ago

Hi, for the apriori algorithm, I use the following parameters: support, minlen, maxlen, confidence, target ( = "rules"). I am currently using this set to both tune my model as well as limit the size of the model (that is, the number of rules generated).

It would be immensely helpful to have a separate parameter to control the size of the model, for example, something like "maxrules" so that one can fine-tune the model (for better performance) using the above existing parameters as well as create a model that has a controlled number of rules using "maxrules". Right now, if I fine-tune my model using the existing set of parameters, the number of rules becomes too large (sometimes a few million) which results in long model-building time as well as making predictions. This (limiting the size of the apriori object as well as model-tuning) becomes quite of an issue with automating thousands of models.

Is it possible to add such a parameter in the near future?

Thanks! Supriya

mhahsler commented 8 years ago

The code used right now unfortunately does not support this kind of limit. Under Windows you can explore memory.limit. Maybe you should use a very aggressive setting for maxlen first to see how many short rules you produce at a given min. support before you allow longer rules.

sjain777 commented 8 years ago

Thanks for your mail. My current minlen and maxlen vary between 1-4. I tried a few such measures as you've suggested, but with several thousands of models and varying data features for each model, optimizing such checks for both performance and size is challenging, and it also adds to the overall processing time.

mhahsler commented 8 years ago

You can now limit the time (at least somewhat). This should help with limiting the number of rules...