mhahsler / arules

Mining Association Rules and Frequent Itemsets with R
http://mhahsler.github.io/arules
GNU General Public License v3.0
194 stars 42 forks source link

find rule on the right side RHS apriori vs fpgrowth #78

Closed mytarmail closed 1 year ago

mytarmail commented 1 year ago

I'm trying to find rules that result in a single target event here is my toy data examle

set.seed(111)
# make transactions with target
li <- list()
for(i in 1:30){
  s <- sample(letters,10,replace = T)
  if(sample(c(T,F),1,prob = c(1,5))){
    # a set of transactions that lead to a target
    s <- c(s,    c("A","B","C","target") )
  }
  s <- make.unique(s)
  li[[i]] <- s
}
library(arules)
trans <- transactions(li)
supp <- 0.05
conf <- 0.1

apri <- apriori(trans, 
                 parameter=list(support = supp, 
                                confidence = conf,
                                minlen=2, 
                                maxlen=4), 
                 appearance = list(rhs="target"))

apri[!is.redundant(apri)] |>
     subset(subset = lift >= 10) |> 
     inspect()

Everything is fine here, but my dataset is huge and I decided to try a more efficient algorithm

fim4r()
fpgr <- fim4r(trans, 
           method = "fpgrowth", 
           target = "rules", 
           supp = supp,
           conf = conf,
           verbose = T,
           appear = list(c("target"), c("c")))

fpgr[!is.redundant(fpgr)] |>
  subset(subset = lift >= 10) |> 
  inspect()

And here I get 181 times worse performance and a bunch of unnecessary rules ..

1) What am I doing wrong? 2) What is the most efficient way to find atargetin huge datasets using this library

mhahsler commented 1 year ago

appearance seems to be broken in the interface. I need to look into this.

mhahsler commented 1 year ago

Confirmed: appearance is broken in fim4r and can only be set for the antecedent. I will update the documentation.

mytarmail commented 1 year ago

Confirmed: appearance is broken in fim4r and can only be set for the antecedent. I will update the documentation.

Hello! can you please tell me when to expect a fix?

mhahsler commented 1 year ago

I do not maintain fim4r. I don't know what their release/fix schedule is.

See: https://borgelt.net/fim4r.html

mytarmail commented 1 year ago

The author of the package does not respond to my emails (

mytarmail commented 1 year ago

Hello! I looked at the example file from the author's site And I managed to run the algorithm correctly, although I do not understand the settings, they are rather strange, and the documentation is not clear enough

It looks something like this: listwith data li can be used from my example above

showrules <- function (rules){                               # print found association rules
  for (i in 1:length(rules)) {
    cat(rules[[i]][[1]], "<-", sort(rules[[i]][[2]]),
        sprintf("(%g,",  rules[[i]][[3]][[1]]),
        sprintf("%g)\n", rules[[i]][[3]][[2]]))
  }
  cat(sprintf("%d rule(s)\n", length(rules)))
}  

library(fim4r) 
apps <- list(c("","target"), c("a","c"))

rules <- fim4r.fpgrowth(li, 
                        target="r", 
                        supp=10,  
                        report="aC", 
                        appear=apps)
showrules(rules)

here is result

target <- A (3, 100)
target <- A B (3, 100)
target <- B (3, 100)
target <- A C (3, 100)
target <- A B C (3, 100)
target <- B C (3, 100)
target <- C (3, 100)
7 rule(s)

So it's not a bug

mhahsler commented 1 year ago

Thank you for this. I was able to fix my code, so you can now run code like:

data(Adult)

# Examples for how to use item appearance with apriori, eclat, 
#   fpgrowth in fim4r. We first mine all rules.
inspect(fim4r(Adult, method = "fpgrowth", 
  target = "rules", supp = .8))

# ignore item "capital-gain=None"
inspect(fim4r(Adult, method = "fpgrowth", 
  target = "rules", supp = .8,
  appear = list(c("capital-gain=None"), c("-"))))

# "capital-gain=None" cannot appear in consequent (antecedent only)
inspect(fim4r(Adult, method = "fpgrowth", 
  target = "rules", supp = .8,
  appear = list(c("capital-gain=None"), c("a"))))

# "capital-gain=None" cannot appear in the antecedent
inspect(fim4r(Adult, method = "fpgrowth", 
  target = "rules", supp = .8,
  appear = list(c("capital-gain=None"), c("c"))))

# restrict the consequent to the item "capital-gain=None".
# That is, "" = all items can only appear in the antecedent with the 
# exception that "capital-gain=None" can only appear in the consequent.
inspect(fim4r(Adult, method = "fpgrowth", 
  target = "rules", supp = .8,
  appear = list(c("", "capital-gain=None"), c("a", "c"))))

For now, you need to install the development version from GitHub. This will be released on CRAN with version 1.7-7. Let me know is something does not work.