mhahsler / arules

Mining Association Rules and Frequent Itemsets with R
http://mhahsler.github.io/arules
GNU General Public License v3.0
194 stars 42 forks source link

correct converting dataframe into transactions for arules in R #39

Closed jasperDD closed 6 years ago

jasperDD commented 6 years ago

I must performing association rules in R and i found the example here http://www.salemmarafi.com/code/market-basket-analysis-with-r/ In this example they work with data(Groceries) but they gave original dataset Groceries.csv

structure(list(chocolate = structure(c(9L, 13L, 1L, 8L, 16L, 
2L, 14L, 11L, 7L, 15L, 17L, 5L, 10L, 4L, 3L, 6L, 2L, 18L, 12L
), .Label = c("bottled water", "canned beer", "chicken,citrus fruit,tropical fruit,root vegetables,whole milk,frozen fish,rollsbuns", 
"chicken,pip fruit,other vegetables,whole milk,dessert,yogurt,whippedsour cream,rollsbuns,pasta,soda,waffles", 
"citrus fruit,pip fruit,root vegetables,other vegetables,whole milk,cream cheese ,domestic eggs,brown bread,margarine,baking powder,waffles", 
"frankfurter,citrus fruit,onions,other vegetables,whole milk,rollsbuns,sugar,soda", 
"frankfurter,rollsbuns,bottled water,fruitvegetable juice,hygiene articles", 
"frankfurter,sausage,butter,whippedsour cream,rollsbuns,margarine,spices", 
"fruitvegetable juice", "hamburger meat,other vegetables,whole milk,curd,yogurt,rollsbuns,pastry,semi-finished bread,margarine,bottled water,fruitvegetable juice", 
"meat,citrus fruit,berries,root vegetables,whole milk,soda", 
"packaged fruitvegetables,whole milk,curd,yogurt,domestic eggs,brown bread,mustard,pickled vegetables,bottled water,misc. beverages", 
"pickled vegetables,coffee", "root vegetables", "tropical fruit,margarine,rum", 
"tropical fruit,pip fruit,onions,other vegetables,whole milk,domestic eggs,sugar,soups,tea,soda,hygiene articles,napkins", 
"tropical fruit,root vegetables,herbs,whole milk,butter milk,whippedsour cream,flour,hygiene articles", 
"turkey,pip fruit,salad dressing,pastry"), class = "factor")), .Names = "chocolate", class = "data.frame", row.names = c(NA, 
-19L))

i load this data

g=read.csv("g.csv",sep=";")

so i must convert it to transactions like arule requires

#'@importClassesFrom arules transactions
trans = as(g, "transactions")

lets' examinate data(Groceries)

> str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
  .. .. ..@ p       : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
  .. .. ..@ Dim     : int [1:2] 169 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 169 obs. of  3 variables:
  .. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
  .. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
  .. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
  ..@ itemsetInfo:'data.frame': 0 obs. of  0 variables
>

and my converted data from original csv

> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
  ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. ..@ i       : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ...
  .. .. ..@ p       : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. ..@ Dim     : int [1:2] 7011 9835
  .. .. ..@ Dimnames:List of 2
  .. .. .. ..$ : NULL
  .. .. .. ..$ : NULL
  .. .. ..@ factors : list()
  ..@ itemInfo   :'data.frame': 7011 obs. of  3 variables:
  .. ..$ labels   : chr [1:7011] "tr=abrasive cleaner" "tr=abrasive cleaner,napkins" "tr=artif. sweetener" "tr=artif. sweetener,coffee" ...
  .. ..$ variables: Factor w/ 1 level "tr": 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ levels   : Factor w/ 7011 levels "abrasive cleaner",..: 1 2 3 4 5 6 7 8 9 10 ...
  ..@ itemsetInfo:'data.frame': 9835 obs. of  1 variable:
  .. ..$ transactionID: chr [1:9835] "1" "2" "3" "4" ...
> 

We see that in data(Groceries)

transactions in sparse format with
 9835 transactions (rows) and
 169 items (columns)

in my trans data

 9835 transactions (rows) and
 7011 items (columns)

i.e. i got 7011 columns from Groceries.csv, meanwhile in embedded example(169 columns)

Why it is so? How this file convert correct. I must understand it, cause, i can't work with my file

mhahsler commented 6 years ago

The data you show does not look like a valid CSV file. Have a look at ? read.transactions to learn how to read in files.

jasperDD commented 6 years ago

Hi, mhahsler, i found decision here trans <- read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ',') thank you