mhahsler / arules

Mining Association Rules and Frequent Itemsets with R
http://mhahsler.github.io/arules
GNU General Public License v3.0
193 stars 42 forks source link

Conversion of sparse count matrix to full matrix #30

Closed NeferkareII closed 7 years ago

NeferkareII commented 7 years ago

We are currently using the arules package to extract frequent item pairs for usage in a PCA. We have approximately 50000 "sets" (which means there are (50000^2)/2 potential pairs). We wanted to convert the sparse matrix into a full matrix using the following code:

apri.test <- apriori(transactions, 
                     parameter = list(target = "frequent itemsets", supp = support, 
                     minlen = 2, maxlen = 2), control = list(verbose = TRUE))
pairs.matrix <- as.matrix(apri.test@items@data)

However this gave us a memory error, saying that 580 gb are needed to allocate the matrix. Our rough estimate (50000^2 * 40 / 1000000000 = 80 GB) was greatly below this value. Is there a more efficient way integrated in the package to extract this matrix or are we attempting to extract the wrong matrix entirely?

All the best from the WU.

mhahsler commented 7 years ago

The conversion should look like this:

as(items(apri.test), "matrix")

However, R still will internally still allocate a ridiculously large amount of memory! If the data is sparse, then you should avoid a dense matrix representation. You probably need to write custom code to directly work with the sparse matrix representation.