rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
https://rasbt.github.io/mlxtend/
Other
4.9k stars 864 forks source link

Key error is raised when trying to generate the rules from the itemsets #696

Open Bushra-Aljbawi opened 4 years ago

Bushra-Aljbawi commented 4 years ago

Hi, thanks a lot for the amazing repository.

I'm using this to generate association rules. Extracting the itemsets of different lengths is not a problem. Also, extracting the itemsets and directly using it to generate rules works. However, after saving the itemsets to a file, generating the rules after reading the saved file generates this error:

KeyError: 'frozenset({\'1\', \'z\', \'f\', \'a\', "\'", \')\', \'l\', \'(\', \'o\', \'k\', \'r\', \' \', \'b\', \'%\', \'e\', \'s\', \'m\', \'}\', \'{\', \'i\', \'u\', \'t\'})You are likely getting this error because the DataFrame is missing antecedent and/or consequent information. You can try using the support_only=True option'

I understand that it's because the code can't find the support of one of the antecedents/consequent items but have no idea how to solve it? I've read all the possible solutions in this thread: https://github.com/rasbt/mlxtend/issues/390 but none of them works. Also tried to save the itemsets to a csv file separated by ; rather than , to avoid special characters problem but still have the problem.

I'm using: python 3.7 mlxtend version: 0.17.2

Would appreciate any idea, Thanks. Bushra.

Bushra-Aljbawi commented 4 years ago

If it's gonna help someone, saving the dataframe of the itemsets to a pickle rather than csv file solved the problem for me.

rasbt commented 4 years ago

Glad you were able to solve the problem. Yeah, CSV files are not really able to store Python objects like frozensets.

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

frequent_itemsets.to_csv('my.csv', index=None)
frequent_itemsets.head()

Screen Shot 2020-06-22 at 10 56 30 PM

df = pd.read_csv('my.csv')
df.head()

Screen Shot 2020-06-22 at 10 56 51 PM

However, if you would like to create a CSV-friendlier presentation, you could write a custom function to help with this. Something along the lines of

def frozenset_to_str(x):
    x = list(x)
    x = str(x).lstrip('[').rstrip(']').strip()
    return x

frequent_itemsets['itemsets'] = frequent_itemsets['itemsets'].apply(lambda x: frozenset_to_str(x))
frequent_itemsets

Screen Shot 2020-06-22 at 10 57 44 PM

I can reopen this issue to remind myself to add something to the documentation to clarify it / provide an example

chenrocky commented 1 year ago

Hi, I'm also experiencing the same error:

You are likely getting this error because the DataFrame is missing  antecedent and/or consequent  information. You can try using the  `support_only=True` option"

I'm using mlxtend 0.23.0 and Python 3.8.13

I'm only experiencing this error when trying to run association_rules after using the fpmax algorithm to generate the frequent_itemsets. I don't experience this error when running association_rules after using fpgrowth, hmine, or apriori to generate the frequent_itemsets. All else is equal (i.e., frequent_itemsets are generated using min_support of 0.5 and max_len of 2; association_rules parameters are metric="lift" and min_threshold=1)

Like the OP, I understand that it's because the code can't find the support of one of the antecedents/consequent items. In the thread https://github.com/rasbt/mlxtend/issues/390 , I saw @rasbt post:

image

And the logic re: "The support of at least one of the two is 0.253623, but the support for the other item might be higher." makes sense to me, but what doesn't make sense to me is how the the item could be missing from frequent_itemsets if it is at least the min_support or higher.

Would appreciate any help. Let me know if I am missing something or have a misunderstanding.

rasbt commented 1 year ago

Thanks for bringing that up. Unfortunately, I currently don't have the capacity to dive into the code and see what's going on (due to other projects and deadlines) but this is worth investigating.

essefi-ahlem commented 8 months ago

Experiencing the same issue with fpmax even when working with the documentation dataset.

To reproduce the error: `import pandas as pd from mlxtend.preprocessing import TransactionEncoder from mlxtend.frequent_patterns import apriori, fpmax, fpgrowth

dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'], ['Milk', 'Apple', 'Kidney Beans', 'Eggs'], ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'], ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

te = TransactionEncoder() te_ary = te.fit(dataset).transform(dataset) df = pd.DataFrame(teary, columns=te.columns)

frequent_itemsets = fpmax(df, min_support=0.1, use_colnames=True)

frequent_itemsets` returns this output image

association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7) Returns the below error image

cc @rasbt, @chenrocky any updates on this error raising with fpmax? Thanks!