rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.82k stars 853 forks source link

Fpgrowth fails with only one transaction #1049

Open yhdelgado opened 1 year ago

yhdelgado commented 1 year ago

I have a big dataset with real data. After several attempts, the execution fails at one transaction. I isolated the transaction and re-executed the algorithm. Always fails. I can't understand why it fails at this point, even with the isolated transaction.

Example:

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns.fpgrowth import fpgrowth
import pandas as pd

transactions =[ [
    114367, 116953, 123213, 125589, 128047, 128579, 130407, 132025, 132082,
    134190, 136097, 136098, 136181, 136357, 136656, 136658, 136659, 136992,
    137180, 137181, 137395, 138215, 139339, 139520, 139551, 140008, 140012,
    140021
  ]]

def get_fpgrowth_associated_products(product_name):
  # filter out transactions that don't include the target product
  filtered_transactions = [t for t in transactions if product_name in t]
  te = TransactionEncoder()
  te_ary = te.fit(filtered_transactions).transform(filtered_transactions)

    # Convert the one-hot encoded array into a pandas DataFrame
  df = pd.DataFrame(te_ary, columns=te.columns_)

    # Compute frequent itemsets using the FP-growth algorithm (min_support = 0.5)
  freq_itemsets = fpgrowth(df, min_support=0.5, use_colnames=True)

  itemsets=set(freq_itemsets.itemsets)

    # find the sets that include the target product
  target_sets = [s for s in itemsets if product_name in s]

    # combine the other items from those sets into a single set
  associated_items = set()
  for s in target_sets:
      associated_items |= s - {product_name}

  return list(associated_items)

get_fpgrowth_associated_products(136181)

Versions

MLxtend 0.22.0 Linux-5.19.0-43-generic-x86_64-with-glibc2.35 Python 3.8.16 Scikit-learn 1.2.2 NumPy 1.24.3 SciPy 1.9.3