heavyai / heavydb

HeavyDB (formerly OmniSciDB)
https://heavy.ai
Apache License 2.0
2.94k stars 445 forks source link

Without GPU why it took so much time??? #68

Closed fleapapa closed 7 years ago

fleapapa commented 7 years ago

My host has no GPU. I've injected ~170 M NYC Taxi trip records to MapD and tested some simple queries. The queries were not fulfilled very quickly. For example, this query

$ time echo "select avg(trip_time_in_secs) from trips;" | mapdql mapd

User mapd connected to database mapd EXPR$0 811.999942 User mapd disconnected from database mapd

real 2m22.165s user 0m0.004s sys 0m0.016s

During the query, i observed ~20MB/s of data read rate. Assumed that 170M rows of 'trip_time_in_secs' values take 300MB after compression, it shouldn't take ~2.5 minutes to finish the query above.

Anything i can do to improve the performance?

asuhan commented 7 years ago

Is this from cold cache? What happens if you run the query again?

fleapapa commented 7 years ago

Wow! Thanks for the prompt response! Yes, 2.5min was from cold cache. If not cold, the same query is finished in a second. Why does it take MapD so long to read 170M integers from disk at 20MB/s rate?

asuhan commented 7 years ago

I don't know, 20MB/s does sound pretty low indeed. We see much better throughput on our servers, is this a spinning disk by any chance? That being said, the assumption of the product is that the workload fits in memory, once it's warmed up queries will be fast.

dwayneberry commented 7 years ago

How did you load the data into the system?

Did you use a bulk load COPY FROM statement?

Could you share the logs of the DB startup plus the first query?

fleapapa commented 7 years ago

@asuhan. I got your point but if i toggle between querying 2 or more different columns i always got >20s response time. That seems not practical for interactive (<3s) query purpose; for example, a web page may issue 10+ queries on different columns.

My host has 1GB available memory while i guess 170M x 2 columns should take less than that amount of memory.

fleapapa commented 7 years ago

@dwayneberry. Yes, i used COPY FROM. It took few tens of minutes to load the data.

Here is the startup log: E0908 12:11:42.807757 13196 MapDHandler.cpp:169] This build isn't CUDA enabled, will run on CPU Thrift: Fri Sep 8 12:11:42 2017 TSocket::open() connect() <Host: localhost Port: 9093>Connection refused Thrift: Fri Sep 8 12:11:43 2017 TSocket::open() connect() <Host: localhost Port: 9093>Connection refused Thrift: Fri Sep 8 12:11:43 2017 TSocket::open() connect() <Host: localhost Port: 9093>Connection refused Thrift: Fri Sep 8 12:11:43 2017 TSocket::open() connect() <Host: localhost Port: 9093>Connection refused Thrift: Fri Sep 8 12:11:43 2017 TSocket::open() connect() <Host: localhost Port: 9093>Connection refused Thrift: Fri Sep 8 12:11:43 2017 TSocket::open() connect() <Host: localhost Port: 9093>Connection refused Thrift: Fri Sep 8 12:11:43 2017 TSocket::open() connect() <Host: localhost Port: 9093>Connection refused Thrift: Fri Sep 8 12:11:43 2017 TSocket::open() connect() <Host: localhost Port: 9093>Connection refused E0908 12:11:43.788535 13196 MapDHandler.cpp:201] No GPUs detected, falling back to CPU mode Received 1

asuhan commented 7 years ago

1GB is way, way too little memory. We don't optimize for that case. The columns do take less than that amount memory, but it's extremely marginal and there are other things, both on the system and in the executable, which use memory as well.

fleapapa commented 7 years ago

What i mean 1GB free memory is after loading mapd-server. I would guess it is sufficient to hold 170M of two columns in memory w/o thrashing :)

My use case is simply to compare simply queries among different columnar databases like MapD and others like Clickhouse.

asuhan commented 7 years ago

The reason it's using more than 1GB is probably the way our pooled memory allocation works. It's fairly configurable and we could expose an option to make it work under this type of memory pressure. That doesn't mean we should, though; running on a system which has 1GB of memory left is not a goal of the project.

fleapapa commented 7 years ago

Got it. Thanks!

fleapapa commented 7 years ago

BTW. FYI. On the same host and environment, WITHOUT data cached, ClickHouse finished the same query in ~20 secs. WITH data cached, it took <1s like MapD does.

Still, MapD is doing great without index. Keep on the excellent project!

cdessanti commented 7 years ago

My gaming desktop has a troughput in disk of 500mb/sec on a single sata ssd disk

Ottieni Outlook per iOShttps://aka.ms/o0ukef


From: Flea Papa notifications@github.com Sent: Friday, September 8, 2017 9:33:54 PM To: mapd/mapd-core Cc: Subscribed Subject: Re: [mapd/mapd-core] Without GPU why it took so much time??? (#68)

Wow! Thanks for the prompt response! Yes, 2.5min was from cold cache. If not cold, the same query is finished in a second. Why does it take MapD so long to read 170M integers from disk at 20MB/s rate?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/mapd/mapd-core/issues/68#issuecomment-328194508, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AdZQl9Q0X3bjwgOhdiXs43RVCezmIwiiks5sgZaigaJpZM4PRmLu.

fleapapa commented 7 years ago

@asuhan,

The reason it's using more than 1GB is probably the way our pooled memory allocation works. It's fairly configurable and we could expose an option to make it work under this type of memory pressure. That doesn't mean we should, though; running on a system which has 1GB of memory left is not a goal of the project.

In my opinion, the issue (at least for my test case) seem more related with how MapD loads and processes cold data than with memory pressure because, after i added another 1GB memory to my (VM) host, the same query still took 2m50s. After the query, free memory is 700+MB, so memory seems still ample :)

I know my disk is slow, but my test is comparative versus another two databases, one of which is columnar and another is row based, with identical h/w environment.

Sorry to comment after this issue is closed.

dwayneberry commented 7 years ago

@fleapapa

Could you please include a compete log of the server startup including the execution of the first query. This will help us identify the delay you are experiencing.

thanks

fleapapa commented 7 years ago

@dwayneberry , where can i find the complete log other than the one i posted previously? My machine is a Ubuntu 16.04 VM hosted on Virtualbox.

dwayneberry commented 7 years ago

@fleapapa That log you sent earlier is the correct one, its just only had the first few lines of the startup. Restart the server. run your query. stop the server. then send us the mapd_server.INFO log

fleapapa commented 7 years ago

@dwayneberry. I did it again but the log shown on my screen looks the same as the previous one. I can't find any file named mapd_server.INFO.log on my filesystem. i started the mapd server using the following command: mapd/bin/mapd_server --data /mnt/trip/mapd/data

dwayneberry commented 7 years ago

@fleapapa The file is just mapd_server.INFO it is a symbolic link to the latest log. In your case it should be in /mnt/trip/mapd/data/mapd_log/mapd_server.INFO

fleapapa commented 7 years ago

@dwayneberry, here you go: cat /mnt/trip/mapd/data/mapd_log/mapd_server.INFO Log file created at: 2017/09/08 17:22:15 Running on machine: osboxes Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg I0908 17:22:15.659196 4637 MapDServer.cpp:602] MapD started with data directory at '/mnt/trip/mapd/data' I0908 17:22:15.705051 4637 MapDServer.cpp:609] Watchdog is set to 1 I0908 17:22:15.705080 4637 MapDServer.cpp:631] cuda block size 0 I0908 17:22:15.705093 4637 MapDServer.cpp:632] cuda grid size 0 I0908 17:22:15.705106 4637 MapDServer.cpp:633] calcite JVM max memory 1024 I0908 17:22:15.705118 4637 MapDServer.cpp:634] MapD Server Port 9091 I0908 17:22:15.705132 4637 MapDServer.cpp:635] MapD Calcite Port 9093 I0908 17:22:15.705180 4637 MapDHandler.cpp:162] MapD Server 3.2.1-20170826-327e7ee E0908 17:22:15.705200 4637 MapDHandler.cpp:169] This build isn't CUDA enabled, will run on CPU I0908 17:22:15.706996 4637 DataMgr.cpp:120] cpuSlabSize is 1908.94M I0908 17:22:15.715643 4637 FileMgr.cpp:116] Read table metadata, Epoch is 0 for table data at '/mnt/trip/mapd/data/mapd_data/table_0_0/' I0908 17:22:15.716774 4637 Calcite.cpp:156] Creating Calcite Handler, Calcite Port is 9093 base data dir is /mnt/trip/mapd/data I0908 17:22:15.716797 4637 Calcite.cpp:95] Running calcite server as a daemon I0908 17:22:16.558720 4637 Calcite.cpp:124] Calcite server start took 800 ms I0908 17:22:16.558768 4637 Calcite.cpp:125] ping took 30 ms I0908 17:22:16.571445 4637 Calcite.cpp:281] [{"name":"Tan","ret":"double","args":["double"]},{"name":"Truncate","ret":"float","args":["float","i32"]},{"name":"ln","ret":"double","args":["double"]},{"name":"Floor2","ret":"i32","args":["i32"]},{"name":"Floor3","ret":"i64","args":["i64"]},{"name":"rect_pixel_bin","ret":"float","args":["float","float","float","i32","i32"]},{"name":"Floor1","ret":"i16","args":["i16"]},{"name":"degrees","ret":"double","args":["double"]},{"name":"approx_distance_in_meters","ret":"double","args":["float","float","float","float"]},{"name":"Ceil2","ret":"i32","args":["i32"]},{"name":"Ceil1","ret":"i16","args":["i16"]},{"name":"Ceil3","ret":"i64","args":["i64"]},{"name":"Log10","ret":"double","args":["double"]},{"name":"Log","ret":"double","args":["double"]},{"name":"distance_in_meters","ret":"double","args":["double","double","double","double"]},{"name":"round_to_digit","ret":"double","args":["double","i32"]},{"name":"Atan2","ret":"double","args":["double","double"]},{"name":"Acos","ret":"double","args":["double"]},{"name":"Sin","ret":"double","args":["double"]},{"name":"Ceil","ret":"double","args":["double"]},{"name":"distance_in_meters","ret":"double","args":["float","float","float","float"]},{"name":"Floor","ret":"float","args":["float"]},{"name":"Truncate","ret":"double","args":["double","i32"]},{"name":"radians","ret":"double","args":["double"]},{"name":"Ceil","ret":"float","args":["float"]},{"name":"ln","ret":"double","args":["float"]},{"name":"reg_hex_vert_pixel_bin_y","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"reg_hex_vert_pixel_bin_x","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"Round","ret":"double","args":["double"]},{"name":"power","ret":"double","args":["double","double"]},{"name":"conv_4326_900913_x","ret":"double","args":["double"]},{"name":"reg_hex_horiz_pixel_bin_y","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"conv_4326_900913_y","ret":"double","args":["double"]},{"name":"Atan","ret":"double","args":["double"]},{"name":"reg_hex_horiz_pixel_bin_x","ret":"float","args":["float","float","float","float","float","float","float","float","float","float","i32","i32"]},{"name":"Floor","ret":"double","args":["double"]},{"name":"Truncate1","ret":"i16","args":["i16","i32"]},{"name":"Truncate2","ret":"i32","args":["i32","i32"]},{"name":"Truncate3","ret":"i64","args":["i64","i32"]},{"name":"Cos","ret":"double","args":["double"]},{"name":"Log","ret":"double","args":["float"]},{"name":"Log10","ret":"double","args":["float"]},{"name":"Asin","ret":"double","args":["double"]},{"name":"Cot","ret":"double","args":["double"]},{"name":"Tan__","ret":"double","args":["float"]},{"name":"rect_pixel_bin_x","ret":"float","args":["float","float","float","float","float","i32"]},{"name":"rect_pixel_bin_y","ret":"float","args":["float","float","float","float","float","i32"]},{"name":"pi","ret":"double","args":[]},{"name":"Exp","ret":"double","args":["double"]}] E0908 17:22:16.572005 4637 MapDHandler.cpp:201] No GPUs detected, falling back to CPU mode I0908 17:22:16.572693 4637 MapDHandler.cpp:210] Started in CPU mode I0908 17:22:28.388641 4652 MapDHandler.cpp:359] User mapd connected to database mapd I0908 17:22:28.392401 4652 MapDHandler.cpp:555] sql_execute :vl1WlQUyxzSUB9t7HYzzcdzAwoB7CZ8z:query_str:select avg(trip_distance) from trips; I0908 17:22:28.393018 4652 Calcite.cpp:247] User mapd catalog mapd sql 'select avg(trip_distance) from trips;' I0908 17:22:30.168512 4654 FileMgr.cpp:116] Read table metadata, Epoch is 310 for table data at '/mnt/trip/mapd/data/mapd_data/table_1_1/' I0908 17:24:15.675556 4652 Calcite.cpp:260] Time in Thrift 14 (ms), Time in Java Calcite server 107268 (ms) I0908 17:24:15.708613 4669 BufferMgr.cpp:266] ALLOCATION slab of 3909516 pages (2001672192B) created in 0 ms CPU_MGR:0 I0908 17:25:15.353220 4652 MapDHandler.cpp:567] sql_execute-COMPLETED Total: 166960 (ms), Execution: 166956 (ms) I0908 17:25:15.354565 4652 MapDHandler.cpp:369] User mapd disconnected from database mapd

dwayneberry commented 7 years ago

@fleapapa Thanks for the log

The issue does appear to be stemming from your slow (extremely slow) disk access speeds. it is not related to your memory footprint directly (there could be knock on memory issues that slow disk processing down due to OS caching and flushing etc)

Accesses that on a regular machine that take .8 secs are taking 107 secs on your machine.

Here is an equivalent log from a machine with a 180M taxi dataset during first query of that table and column.

I0908 20:55:12.880648 32350 MapDHandler.cpp:2302] User mapd sets CPU mode.
I0908 20:55:16.445469 32350 MapDHandler.cpp:555] sql_execute :s0XKrGNswtuPTNpg9U0IWHeGnX8KTcH9:query_str:select avg(trip_distance) from nyc_taxi_nolion_2014;
I0908 20:55:16.445830 32350 Calcite.cpp:247] User mapd catalog mapd sql 'select avg(trip_distance) from nyc_taxi_nolion_2014;'
I0908 20:55:16.956212 32344 FileMgr.cpp:116] Read table metadata, Epoch is 697 for table data at '/raidStorage/prod/mapd-storage/data/mapd_data/table_1_8/'
I0908 20:55:17.293797 32350 Calcite.cpp:260] Time in Thrift 5 (ms), Time in Java Calcite server 842 (ms)
I0908 20:55:17.317417 32355 BufferMgr.cpp:266] ALLOCATION slab of 8388608 pages (4294967296B) created in 0 ms CPU_MGR:0
I0908 20:55:17.687477 32350 MapDHandler.cpp:567] sql_execute-COMPLETED Total: 1241 (ms), Execution: 1241 (ms)

the time between this line

I0908 20:55:16.956212 32344 FileMgr.cpp:116] Read table metadata, Epoch is 697 for table data at '/raidStorage/prod/mapd-storage/data/mapd_data/table_1_8/'

and this line

I0908 20:55:17.293797 32350 Calcite.cpp:260] Time in Thrift 5 (ms), Time in Java Calcite server 842 (ms)

is the time to read the metadata for the file

In a more normal case, like this it occurs first time from cold in about 800ms

In your case it is taking over 107 secs.

During this time MapD brings in all the metadata for all the columns of the table regardless of what the specific query is actually going to touch, so it is one time cost for the entire table. These can be a large number of fairly small reads, so it is worse in your bad IO situation. We have discussed options in this area in general to reduce the granularity of these reads, which would significantly improve your situation.

I can't speak to the other platforms but perhaps they amortize this across the different queries depending on how they manage their on disk representations and metadata (so perhaps the next column you touch will take another 20 secs etc).

In the second part of the process when we bring in the real data assuming the trip_distance is a decimal(14,2) there is about 170M x 8 bytes of data to load, so at 20MB/s there is about 68 seconds of transfer time to so the additional 60 seconds in you log lines up appropriately.

I0908 17:24:15.708613 4669 BufferMgr.cpp:266] ALLOCATION slab of 3909516 pages (2001672192B) created in 0 ms CPU_MGR:0
I0908 17:25:15.353220 4652 MapDHandler.cpp:567] sql_execute-COMPLETED Total: 166960 (ms), 

and basically something less than a second to do the actual query on the loaded data.

So no real mystery here other than a slow disk and a different model of when and how disk information is transfered to system I think. The positive take away with MapD is that once this one time cost on server start up occurs the rest of your queries will remain fast as we try not to touch the disk any more than we absolutely have to (a real benefit in your case).

fleapapa commented 7 years ago

@dwayneberry, thanks a lot for the analysis. I really appreciate it! First of all my disclaimer: I am not an employee of or affiliated with any of the companies which own the databases that i'm testing. For my current company project I'm simply trying to find an OLAP database that is best suitable for hundreds of TBs of network metrics (from reports by millions of endpoint per minute with each report of size 10KB) stored on general-purpose bare-metal storages and servers hosted in ordinary data centers. To beat other competitors, our product needs to be the most cost-effective (so having lots of GPU is unlikely).

Quoted:

I can't speak to the other platforms but perhaps they amortize this across the different queries depending on how they manage their on disk representations and metadata (so perhaps the next column you touch will take another 20 secs etc). In the second part of the process when we bring in the real data assuming the trip_distance is a decimal(14,2) there is about 170M x 8 bytes of data to load, so at 20MB/s there is about 68 seconds of transfer time to so the additional 60 seconds in you log lines up appropriately.

Just did a fresh reboot of my VM as well as the other database i'm testing. The table i test has only 14 columns. Here is part of the schema relevant to the queries i'm testing for comparison.

_c7    tinyint,              # passenger_count
_c8    integer,              # trip_time_in_secs
_c9    float,                # trip_distance

Here is the three queries submitted immediately after system/database boot up.

:) set max_threads=4; Ok. 0 rows in set. Elapsed: 0.007 sec.

:) select avg(_c9) from trip; 1 rows in set. Elapsed: 25.058 sec. Processed 172.04 million rows, 688.15 MB (6.87 million rows/s., 27.46 MB/s.)

:) select avg(_c8) from trip; 1 rows in set. Elapsed: 14.164 sec. Processed 172.04 million rows, 344.07 MB (12.15 million rows/s., 24.29 MB/s.)

:) select avg(_c7) from trip; 1 rows in set. Elapsed: 4.639 sec. Processed 172.04 million rows, 172.04 MB (37.08 million rows/s., 37.08 MB/s.)

Finally, my only point has been that MapD (cold) data access management has some space to improve.

Thanks again for deeply investigating my case. I really appreciate it!

fleapapa commented 7 years ago

By the way, i'm going to setup another VM with a faster disk and larger memory for the same comparison test. Will update later.

dwayneberry commented 7 years ago

@fleapapa

Thanks for your detailed response. We are always trying to improve all of MapD, currently we are not specifically focused on the cold load times, but we certainly have some places we know we can improve, so your feedback and experience is much appreciated.

Based on your use case I'm curious why cold start time is such a big issue? Its seems steady state with constant insert would be more your area of key concern/focus.

As you are running on CPU, may I ask how many cores are available in the VM you are running? With a dataset of only 170M our defaults would only be using 6 cores regardless of how many are available.

As a curiosity for myself I would be interested to see the times from the other database (I'm guessing clickhouse from the prompt :) ) when you include some kind of filter in the query? Simply run a select avg(_c9) from trip where _c8 = 2;

fleapapa commented 7 years ago

@dwayneberry, thanks for response again.

Quote:

Based on your use case I'm curious why cold start time is such a big issue? Its seems steady state with constant insert would be more your area of key concern/focus.

For columnar databases, heavy constant insert seems not a big issue. That's one of the reasons why i proposed to fork input data stream to a columnar database and gradually moving away from existing non-columnar databases. Interactive query is another (big) reason of doing so. Currently we're using a document-based NoSQL database to store JSON documents of network metrics, each of which has deeply nested subdocuments. Our dashboard needs to show some hours up to 90 days of historical charts of various reported metrics for each or all devices of a user (who dials a service call) in 3 seconds. Due to our data volume, using all SSD incurs high cost, so a big portion of data (eg., 7+ day older) is stored in spinning disks. The document database has no problem on quickly locating and loading cold data of a user into memory, but with our denormalized schema the database's aggregation pipeline incurs very high runtime overhead on the database servers, even for a simple query of one metric, making the system not perform well on serving high volume of customer service calls and simultaneous mobile app accesses. That's why i tried to migrate the system to use a MPP database or columnar database. With most columnar databases discouraging indexing and favoring all-in-memory operation, cold start time is critical to response time of our dashboard due to the following reasons:

We can't make any customer (representative) wait for a few minutes before starting to diagnose his network problem, right:?

As you are running on CPU, may I ask how many cores are available in the VM you are running? With a dataset of only 170M our defaults would only be using 6 cores regardless of how many are available.

The VM used in previous tests has only 2 cores. I am setting up another VM which will have 6 cores. (i can access couples of big EC2 with big SSD, but those can not be our final production environment:)

As a curiosity for myself I would be interested to see the times from the other database (I'm guessing clickhouse from the prompt :) ) when you include some kind of filter in the query? Simply run a select avg(_c9) from trip where _c8 = 2;

On my 2-core VM (with a slow disk), the time is:

:) select avg(_c9) from trip where _c8 = 2; 1 rows in set. Elapsed: 26.338 sec. Processed 172.04 million rows, 689.79 MB (6.53 million rows/s., 26.19 MB/s.)

But the time (of cold start) is much shorter if the column in filter is part of primary index:

:) select avg(_c8) from trip where _c5>'2013-01-01 00:00:00' and _c5<'2013-02-01 23:00:00' 1 rows in set. Elapsed: 1.332 sec. Processed 15.07 million rows, 90.44 MB (11.31 million rows/s., 67.89 MB/s.)

I will update you soon with the times (and comparisons:) on the 6-core VM with faster disk which you're interested. Please stay tune :)

fleapapa commented 7 years ago

@dwayneberry, here comes "cold start" results from the other database after boot up on a 6-core VM with 5GB free memory and a SATA 6 Gb/s 7200 RPM Hard Drive:

:) select avg(_c8) from trip; 1 rows in set. Elapsed: 3.627 sec. Processed 172.09 million rows, 344.17 MB (47.44 million rows/s., 94.88 MB/s.)

With an indexed column in filter condition, the time is:

:) select avg(_c8) from trip where _c5>'2013-01-01 00:00:00' and _c5<'2013-02-01 23:00:00' 1 rows in set. Elapsed: 0.707 sec. Processed 15.32 million rows, 91.90 MB (21.67 million rows/s., 130.02 MB/s.)

The query that you're interested:

:) select avg(_c9) from trip where _c8 = 2; 1 rows in set. Elapsed: 9.217 sec. Processed 172.09 million rows, 705.06 MB (18.67 million rows/s., 76.50 MB/s.)

Will post result from MapD tomorrow after installing it on this VM tomorrow...

fleapapa commented 7 years ago

@dwayneberry, here is the result of queries with MapD, each of which was done right after MapD boot up: (VM has 6 cores, 5GB free memory and a SATA 6 Gb/s 7200 RPM Hard Drive)

Aggregation w/o a filter:

$ time echo "select avg(trip_distance) from trips;" | mapd/bin/mapdql mapd -u $MAPD_USERNAME -p $MAPD_PASSWORD User mapd connected to database mapd EXPR$0 10.433180 User mapd disconnected from database mapd real 1m6.061s user 0m0.000s sys 0m0.008s

Reboot MapD and query with a filter:

$ time echo "select avg(trip_distance) from trips where trip_time_in_secs=2;" | mapd/bin/mapdql mapd -u $MAPD_USERNAME -p $MA PD_PASSWORD User mapd connected to database mapd EXPR$0 5.187741 User mapd disconnected from database mapd real 1m14.031s user 0m0.004s sys 0m0.000s

Reboot MapD and query with another filter:

$ time echo "select avg(trip_distance) from trips where pickup_datetime>'2013-01-01 00:00:00' and pickup_datetime<'2013-02-01 23:00:00';" | mapd/bin/mapdql mapd -u $MAPD_USERNAME -p $MAPD_PASSWORD User mapd connected to database mapd EXPR$0 3.400610 User mapd disconnected from database mapd real 1m8.699s user 0m0.008s sys 0m0.000s

The results seem consistent.

I also redid test on the other columnar db and updated the results in previous cell. System cache was cleared and the db was restarted before each query was submitted.

Comparing the result of the range query, my take-away is that index does help a lot especially on fast cold start of a small range of data that is usually seen in interactive web/mobile applications.

fleapapa commented 7 years ago

By the way, in my test both databases perform almost equally well after cold data has been loaded in memory.

With MapD:

$ time echo "select avg(trip_distance) from trips;" | mapd/bin/mapdql mapd -u $MAPD_USERNAME -p $MAPD_PASSWORD User mapd connected to database mapd EXPR$0 10.433180 User mapd disconnected from database mapd real 0m0.213s user 0m0.000s sys 0m0.000s $ time echo "select avg(trip_distance) from trips;" | mapd/bin/mapdql mapd -u $MAPD_USERNAME -p $MAPD_PASSWORD User mapd connected to database mapd EXPR$0 10.433180 User mapd disconnected from database mapd real 0m0.210s user 0m0.004s sys 0m0.000s

With the other database:

:) select avg(_c9) from trip; ┌──────────avg(_c9)─┐ │ 8.338322625856835 │ └───────────────────┘ 1 rows in set. Elapsed: 0.321 sec. Processed 172.09 million rows, 688.35 MB (420.63 million rows/s., 1.68 GB/s.) :) select avg(_c9) from trip; ┌──────────avg(_c9)─┐ │ 8.338322625855596 │ └───────────────────┘ 1 rows in set. Elapsed: 0.322 sec. Processed 172.09 million rows, 688.35 MB (533.73 million rows/s., 2.13 GB/s.)

Considering the overhead bash forks a process, i guess MapD is bits faster in this case. From source code, it seems MapD dynamically generates LLVM IR code while the other database does not. I doubt whether the small performance gain can really justify the more complexity and, IMO, less flexibility of IR code generation. What i mean flexibility here is approximately the development cost of modifying/creating related IR-generation source code when changing/adding a SQL query. Implementing a SQL query in C++ is much easier than in IR, i guess :)

tmostak commented 7 years ago

@fleapapa I think some of the initial cold load slowness is because we by default reserve the CPU buffer pool in 4GB chunks, which means that you are swapping.

You can control the CPU buffer pool size with the --cpu-buffer-mem-bytes on mapd_server .

(Note: just saw that you increased the RAM allocated to the VM, still lowering the buffer pool size could help some).

In general users can get close to saturating SSD read bandwidth (500-1GB/sec) and we've seen figures north of 2GB/sec on RAIDed systems. We haven't spent much time tuning for spinning disk since most of our users use SSDs.

With the right memory sizing (and making sure fragment_size is set correctly to spread the data across all available threads) I think you'll find MapD on CPU to be significantly faster than ClickHouse. For example, in the benchmarks here (http://tech.marksblogg.com/benchmarks.html) we can still run the first query in 200ms on CPU. A lot of this speedup is from dynamic compilation of query plans.

tmostak commented 7 years ago

I should also add that much of the benefits of query compilation emerge with more complicated queries as compilation means you don't have to materialize intermediate results after each operation.

fleapapa commented 7 years ago

@tmostak, thank you for the comment. I changed --cpu-buffer-mem-bytes setting, cleaned file cache and restart mapd server

mapd/bin/mapd_server --data /mnt/trip/mapd/data --cpu-buffer-mem-bytes 1073741824

but the query still took similar time.

$ time echo "select avg(trip_distance) from trips;" | mapd/bin/mapdql mapd -u $MAPD_USERNAME -p $MAPD_PASSWORD User mapd connected to database mapd EXPR$0 10.433180 User mapd disconnected from database mapd real 1m6.457s user 0m0.004s sys 0m0.000s

I'm sure my VM has 4+GB free memory after mapd started. A 1GB chunk of cpu buffer should be more than enough to hold 170M values of a column.

The time taken seem much more than "ideal" I/O time of 170M values of a column per transfer rate of my SATA hard drive.

Could you please elaborate about what i should add to the command line to tune fragment_size for my 6-core VM?

By the way, as i described previously, depending on the duration that our (ISP) customers require us to retain the data, our data size can be from 100s TB to few PBs while data within last 7 days are frequently accessed and vice versa. However, (small) range queries from our dashboard need to span any datetime interval. Currently we periodically move cold data from SDD to spinning disks for x10 better cost effectiveness. Losing a bid once due to higher price tag makes us cost sensitive.