mhahsler / arules

Mining Association Rules and Frequent Itemsets with R
http://mhahsler.github.io/arules
GNU General Public License v3.0
194 stars 42 forks source link

Apriori produces 0 rules for large number of observations #48

Closed fuhrmanator closed 5 years ago

fuhrmanator commented 5 years ago

In some of my Apriori runs, I get 0 rules. The trouble seems to be with the number of observations, at least how I'm using it.

Here's my R script to reproduce the problem. smallRules are calculated properly. However, largeRules remain 0 whenever largeObsCount is above 250. I'm actually not sure where the sweet spot is (200 is OK). I was narrowing it down, but unfortunately random.org won't let me run any more tests today. I had reported this on StackOverflow in a comment, but in fact I wasn't realizing that there were 0 rules.

if(! "arules" %in% installed.packages()) install.packages("arules", depend = TRUE)
library (arules)
if(! "random" %in% installed.packages()) install.packages("random", depend = TRUE)
library(random)

smallItemCount <- 24
smallSampleNames <- as.vector(randomStrings(n=smallItemCount, len=10, unique=TRUE))
shortSamplePaths <- rep("src/", smallItemCount)
smallTmpData <- data.frame(paths=shortSamplePaths,names = smallSampleNames)
smallSampleItems <- interaction(smallTmpData[head(names(smallTmpData))], sep= "")

smallObsCount = 500
smallSampleData <- data.frame(
  X = sample(smallSampleItems, smallObsCount, replace = TRUE),
  Y = sample(smallSampleItems, smallObsCount, replace = TRUE)
)

smallRules <-apriori(smallSampleData, parameter=list(supp=0.005,conf=0.1,minlen=2))

largeItemCount = 578
largeSampleNames <- as.vector(randomStrings(n=largeItemCount, len=10, unique=TRUE))
#longSamplePaths <- rep("modules/junit4/src/test/java/org/powermock/modules/junit4/", largeItemCount)
longSamplePaths <- rep("junit4/", largeItemCount)
bigTmpData <- data.frame(paths=longSamplePaths,names = largeSampleNames)
bigSampleItems <- interaction(bigTmpData[head(names(bigTmpData))], sep= "")

largeObsCount = 250
bigSampleData <- data.frame(
  X = sample(bigSampleItems, largeObsCount, replace = TRUE),
  Y = sample(bigSampleItems, largeObsCount, replace = TRUE)
)

bigRules <-apriori(bigSampleData, parameter=list(supp=0.005,conf=0.1,minlen=2))
fuhrmanator commented 5 years ago

I think this is not a bug, finally. I realize that larger numbers of observations affects the support numbers. By decreasing that on the parameter list (supp=0.001), I was able to get some rules in my own test case. Sorry for the noise.

Perhaps there should be a warning when there are no rules?

mhahsler commented 5 years ago

No worries. I thought about a warning before, but 0 rules is a legitimate output. Maybe I will add something to the man page.