Closed migueldft closed 2 years ago
Oh wow, that's a big dataset. Not sure if it will be possible to handle with the current implementation. However, you could try
fit_transform(X, sparse=True)
I.e.,
te = TransactionEncoder()
te_ary = te.fit_transform(list(user_items), sparse=True)
and then if that worked, maybe use fpmax
instead of apriori
The transform part now works perfectly !
But how can I procced after that ? Do I still need to convert it to a DataFrame for applying appriori or fpmax algorithms ? If not, could you provide some example?
Converting it back to a df leads me to the same storage error
Ah, right. You can use a sparse DataFrame. E.g.,
df = pd.DataFrame.sparse.from_spmatrix(te_ary, columns=te.columns_)
frequent_itemsets = fpmax(df, min_support=0.6, use_colnames=True)
I should probably document that.
I am trying to use apriori algorithm in a large ecommerce dataset.It is around 300k products and 2M orders.
My first step was making a list of products for each order:
user_items = df.groupby('sale_order_store_number')['sku_config'].apply(list)
after that I tryied to use the Encoder
which gives me the memory error Unable to allocate 899. GiB for an array with shape (2724244, 354208) and data type bool
Anything I can do to avoid this kind of problem ? Thanks in advance