stanford-futuredata / macrobase

MacroBase: A Search Engine for Fast Data
http://macrobase.stanford.edu/
Apache License 2.0
661 stars 126 forks source link

Time Complexity of the Algorithm? #250

Open r-luo opened 6 years ago

r-luo commented 6 years ago

Hi, I'm trying out macrobase on a dataset with ~15 million rows and ~30 columns. I selected three columns to try out first but it's taking forever to run. So I'm wondering if there's any information about time complexity for the algorithm.

fabuzaid21 commented 6 years ago

Hi @madcarrot, thanks for using MacroBase! Could you give us some information on:

MacroBase uses a variant of the APriori algorithm in its query engine. The time complexity is linear in the number of rows; while it can be combinatorial in the number of columns, we do a lot of pruning by ignoring low-frequency combinations of column values during execution of the algorithm.

If you only selected three columns to run on, then I'm surprised that it's taking so long to run; that's why I'm wondering if maybe it's simply an old version of the code. Let us know, and we'll figure out what's going on.