mustafaascha / mustafaascha.github.io

MIT License
0 stars 0 forks source link

Error in makebin(data, file) : 'sid' invalid (order) In addition: Warning message: In makebin(data, file) : 'eventID' is a factor #1

Closed BhargavOE closed 6 years ago

BhargavOE commented 6 years ago

I was trying to run the code on a different dataset and I encountered an error that says..

preprocessing ...Error in makebin(data, file) : 'sid' invalid (order) In addition: Warning message: In makebin(data, file) : 'eventID' is a factor

Looks like other people at stack Overflow have the same issue and is not solved. Any idea on why this is happening?

mustafaascha commented 6 years ago

I assume you're referring to using a transactions list/matrix for the eclat(?) algorithm. I strongly suggest referring to the "zaki" dataset that is included with the arulesSequences package, as it demonstrates the expected form of data (use something like "system.file("zaki", package = "arulesSequences").

This issue actually took me some time to track down, as well. You'll need to make sure that the "sid" is in ascending order, I believe, because the algorithm depends on pre-sorted input.

Good luck!

BhargavOE commented 6 years ago

I have a dataframe df of the form index | eventID | basket_size | sequenceID | items 1 | 1 | 1 | 1 | 006060696000, 2 | 2 | 2 | 1 | 005060036000, 006050035000, 3 | 3 | 1 | 1 | 018010451000, 4 | 4 | 1 | 1 | 019010133000, 5 | 5 | 1 | 1 | 016030302000, 6 | 6 | 1 | 1 | 028050039000, 7 | 7 | 1 | 1 | 032070264000, 8 | 8 | 1 | 1 | 032081045000, 9 | 9 | 1 | 1 | 018020189000, 10 | 10 | 1 | 1 | 009040304000,

I used the following commands to come up with the sequential association rules.

write.table(df, "transactions.csv", row.names = FALSE, col.names = FALSE, sep = ' ', quote = FALSE)

sim_tx <- read_baskets("transactions.csv", info = c("eventID", "size", "sequenceID"))

pander(head(as(sim_tx, "data.frame")))

sim_cspade <- cspade(sim_tx, parameter = list(support = 0.1, maxlen = 3, mingap = 2), control = list(verbose = TRUE, summary = TRUE, bfstype = TRUE))

I get an error as specified above

BhargavOE commented 6 years ago

Finally got the reason for the issue. I had some eventIDs with over 500 basket size and it was being written to the next line. read_baskets read the item as eventID and sequenceID and that led to them not being in order.