Closed BhargavOE closed 6 years ago
I assume you're referring to using a transactions list/matrix for the eclat(?) algorithm. I strongly suggest referring to the "zaki" dataset that is included with the arulesSequences package, as it demonstrates the expected form of data (use something like "system.file("zaki", package = "arulesSequences").
This issue actually took me some time to track down, as well. You'll need to make sure that the "sid" is in ascending order, I believe, because the algorithm depends on pre-sorted input.
Good luck!
I have a dataframe df of the form index | eventID | basket_size | sequenceID | items 1 | 1 | 1 | 1 | 006060696000, 2 | 2 | 2 | 1 | 005060036000, 006050035000, 3 | 3 | 1 | 1 | 018010451000, 4 | 4 | 1 | 1 | 019010133000, 5 | 5 | 1 | 1 | 016030302000, 6 | 6 | 1 | 1 | 028050039000, 7 | 7 | 1 | 1 | 032070264000, 8 | 8 | 1 | 1 | 032081045000, 9 | 9 | 1 | 1 | 018020189000, 10 | 10 | 1 | 1 | 009040304000,
I used the following commands to come up with the sequential association rules.
write.table(df, "transactions.csv", row.names = FALSE, col.names = FALSE, sep = ' ', quote = FALSE)
sim_tx <- read_baskets("transactions.csv", info = c("eventID", "size", "sequenceID"))
pander(head(as(sim_tx, "data.frame")))
sim_cspade <- cspade(sim_tx, parameter = list(support = 0.1, maxlen = 3, mingap = 2), control = list(verbose = TRUE, summary = TRUE, bfstype = TRUE))
I get an error as specified above
Finally got the reason for the issue. I had some eventIDs with over 500 basket size and it was being written to the next line. read_baskets read the item as eventID and sequenceID and that led to them not being in order.
I was trying to run the code on a different dataset and I encountered an error that says..
preprocessing ...Error in makebin(data, file) : 'sid' invalid (order) In addition: Warning message: In makebin(data, file) : 'eventID' is a factor
Looks like other people at stack Overflow have the same issue and is not solved. Any idea on why this is happening?