ymoch / apyori

A simple implementation of Apriori algorithm by Python.
MIT License
246 stars 93 forks source link

How to use apyori package on external data on ipython notebook? #32

Closed jashshah closed 6 years ago

jashshah commented 7 years ago

Hi, I am looking to use the apyori package to do some association rule mining on the attached data set. Can you please tell me how to use this package to load the data and work on an ipython notebook? The instructions provided are for command line interface and I am unable to load the data itself.

Can you tell me where am I going wrong?

Regards, Jash data_demo.zip

surajiiitm commented 6 years ago

Hi, I have list of products for the given order, but for some list of results i have only one item with lift and confidence. why is it? At least I should have two item to find the correlation between them.

ymoch commented 6 years ago

Can you please tell me how to use this package to load the data and work on an ipython notebook?

I'm very sorry for my late response... Using as a Python package probably help you.

from apyori import load_transactions, apriori

with open('path/to/transaction/tile') as f:
    transactions = apyori.load_transactions(f)
    results = list(apriori(transactions)

You can find the options in Pydoc.

ymoch commented 6 years ago

for some list of results i have only one item with lift and confidence. why is it?

Have you checked the min_support parameter? Apriori algorithm is accelerated with limiting skipping relations that have small support. Running with a smaller min_support parameter will possibly solve your problem (though that will take more time).

surajiiitm commented 6 years ago

Thank you for the response and i have decreased the minimum support value and decreased the minimum confidence value but i result is the same. Here is the sample of my transactions file. Given sample is the list of product_id for the transactions.

46200,46198,27624,8040,None,None,None,None,None,None 45857,None,None,None,None,None,None,None,None,None 11916,11915,8040,44671,None,None,None,None,None,None 26955,11916,11915,8040,44671,None,None,None,None,None 11916,11915,8040,44671,None,None,None,None,None,None 11916,11915,8040,44671,None,None,None,None,None,None 11916,11915,8040,44671,None,None,None,None,None,None 11916,11915,8040,44671,None,None,None,None,None,None 11916,11915,8040,44671,None,None,None,None,None,None 11916,11915,8040,44671,None,None,None,None,None,None 11916,11915,8040,44671,None,None,None,None,None,None

ymoch commented 6 years ago

Thank you for your sample! I run apyori.py and got the correct answer.

python apyori.py < foo.csv

Now I wonder if your problem is "confidence and lift are based on only 1 item, is that correct?" Confidence and lift are calculated for the correlation between an itemset X and another one Y (X => Y). The number of items in X and Y can be 1. X and Y is presented as items_base and items_add in apyori. When items_base has 1 item and items_add has 1 item, this stands for a 2-item correlation. Could this make sense?

cf. https://en.wikipedia.org/wiki/Association_rule_learning#Useful_Concepts

0324063 commented 3 years ago

你好,我想請問一下,如果在一筆訂單中,同樣商品出現多次,那用這套件,他也會視為出現一次嗎?