simlaudato / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Default configuration parameters may need revision #807

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
During one of the customer interactions, I tried loading a ~700mb CSV dataset 
from an external dataset, to an internal one, with a trivial transformation 
between the two. With default settings, this did not work well or at all- on a 
Macbook Air with 4GB RAM, it took about 1754s vs. 148s for a bulk load. On my 
laptop, the default settings would not even finish. Tweaking the config to 
allow the memory component to grow past 32MB allowed the workload to finish, 
but it was still about 8x slower (~823s vs. 127s). 

The default settings let the memory component be about 32MB per index. I 
believe it was said that this might not be an optimal choice. I also recall 
that all indices share the same settings, so there is issue with making it too 
big. However if it is too small, it seems like one can get into a situation 
rather easily where more time is spent merging components than actually 
inserting data (or at least this is how it seems). I do have a Yourkit snapshot 
of both of these scenarios, however they are too big to attach to the issue. If 
they are of interest please let me know and I can email them. 

Original issue reported on code.google.com by ima...@uci.edu on 11 Oct 2014 at 12:09

GoogleCodeExporter commented 8 years ago

Original comment by ima...@uci.edu on 14 Oct 2014 at 9:33

GoogleCodeExporter commented 8 years ago

Original comment by dtab...@gmail.com on 17 Oct 2014 at 7:01