Closed GauthierMagnin closed 2 years ago
If this can help, here are two other examples.
Both are about 4 transactions and a minimum support of 0.25
. The first one contains 4 different itemsets and none is considered frequent. The second one contains 3 different itemsets and all are considered frequent. Some of the related itemsets exist in both examples and have the same support but are or are not considered as frequent.
First example:
data = list(t1 = "A",
t2 = "B",
t3 = "C",
t4 = "D")
labels = c("A", "B", "C", "D")
transact = as(encode(data, labels), "transactions")
inspect(eclat(transact, parameter = list(support = 0.25)))
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.25 1 10 frequent itemsets TRUE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 1
eclat - zero frequent items
set of 0 itemsets
Second example:
data = list(t1 = "A",
t2 = "B",
t3 = "C",
t4 = "A")
labels = c("A", "B", "C")
transact = as(encode(data, labels), "transactions")
inspect(eclat(transact, parameter = list(support = 0.25)))
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.25 1 10 frequent itemsets TRUE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 1
create itemset ...
set transactions ...[3 item(s), 4 transaction(s)] done [0.00s].
sorting and recoding items ... [3 item(s)] done [0.00s].
creating bit matrix ... [3 row(s), 4 column(s)] done [0.00s].
writing ... [3 set(s)] done [0.00s].
Creating S4 object ... done [0.00s].
items support count
[1] {A} 0.50 2
[2] {B} 0.25 1
[3] {C} 0.25 1
Thank you for the detailed bug report. The issue with eclat is now (hopefully) fixed in the development version on GitHub. Let me know if the results are now not as expected.
The fix will be part of the next CRAN release. Please use the GitHub version till then.
Regards, -MFH
It seems to work as expected now. Thank you very much for the quick fix.
Hello,
I have a similar issue related to the minimum support parameter that gives an error. I am using arules package version 1.7.5 and R 4.1.0. In particular, it gives an error when the absolute minimum support count is the same as the maximum number of transactions for an item.
Here is an example:
nsamples_tot <- 2058
transactions <- matrix(0, nrow = 303, ncol = 179)
transactions[1:21,1] <- 1
transactions[10:20,2] <- 1
globalSupport <- 0.01
nsamples <- ceiling(globalSupport * nsamples_tot)
minSupport <- nsamples/nrow(transactions)
minSupport
[1] 0.06930693
eclat(transactions, parameter = list(support = minSupport))
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.06930693 1 10 frequent itemsets TRUE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 21
create itemset ...
set transactions ...[2 item(s), 303 transaction(s)] done [0.00s].
sorting and recoding items ... [0 item(s)] done [0.00s].
Error in eclat(transactions, parameter = list(support = minSupport)) :
no items or transactions to work on
However, when passing manually minSupport as 0.06930693 it works:
inspect(eclat(transactions, parameter = list(support = 0.06930693)))
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.06930693 1 10 frequent itemsets TRUE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 20
create itemset ...
set transactions ...[2 item(s), 303 transaction(s)] done [0.00s].
sorting and recoding items ... [1 item(s)] done [0.00s].
creating sparse bit matrix ... [1 row(s), 303 column(s)] done [0.00s].
writing ... [1 set(s)] done [0.00s].
Creating S4 object ... done [0.00s].
items support count
[1] {1} 0.06930693 21
But gives zero set when using an approximation to the second decimal point, although the absolute minimum support count is 21.
inspect(eclat(transactions, parameter = list(support = 0.07)))
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.07 1 10 frequent itemsets TRUE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 21
eclat - zero frequent items
If I understood correctly the function should include the case >=.
Thanks for your help!
Number representation is, unfortunately, complicated (binary representation, floating-point representation, and rounding). When you say
minSupport <- nsamples/nrow(transactions)
then the result of the division may be rounded up at the last representable digit, and that is why you find no results.
You need to manually make sure that you always round down.
minSupport <- nsamples/nrow(transactions)
minSupport
[1] 0.06930693
sprintf("%.100f", minSupport)
[1] "0.0693069306930693129764620152855059131979942321777343750000000000000000000000000000000000000000000000"
# round down with 6 digits
dig <- 6
minSupport_rounded_down <- round(minSupport - .5*10^(-dig), digits = dig)
sprintf("%.100f", minSupport_rounded_down)
[1] "0.0693060000000000064890315343291149474680423736572265625000000000000000000000000000000000000000000000"
eclat(transactions, parameter = list(support = minSupport_rounded_down))
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.069306 1 10 frequent itemsets TRUE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 20
create itemset ...
set transactions ...[2 item(s), 303 transaction(s)] done [0.00s].
sorting and recoding items ... [1 item(s)] done [0.00s].
creating sparse bit matrix ... [1 row(s), 303 column(s)] done [0.00s].
writing ... [1 set(s)] done [0.00s].
Creating S4 object ... done [0.00s].
set of 1 itemsets
Thank you very much for the help!
I have now added code to eclat and apriori to round down automatically at the C level. This will prevent this unexpected behavior in the future. The addition will be part of the next release.
Thanks again for the comprehensive code that shows the behavior!
Hello,
When mining frequent itemsets, I read that the parameter
support
defines the minimum support to consider an itemset as frequent. In other words, an itemset must have a support>=
to the given threshold to be considered as frequent.In the following example, there are two transactions containing two different itemsets and therefore having a support of
0.5
. However, if a support threshold of0.5
is used as parameter, no itemset is considered frequent whereas a threshold of0.4
does consider the two itemsets as frequent itemsets.Here is the output of the first call:
Here is the output of the second call:
Is this lower bound supposed to be included or excluded? Is there an issue about the consideration of the threshold (
>
instead of>=
) or am I wrong with the interpretation of the documentation? If I am wrong, can we expect in the near future the addition of a parameter to choose whether the threshold is included or excluded?