heavyai / heavydb

HeavyDB (formerly OmniSciDB)
https://heavy.ai
Apache License 2.0
2.95k stars 448 forks source link

Unable to limit the upper limit of the process memory and the initial load data IO is too low #672

Open LiYade opened 3 years ago

LiYade commented 3 years ago

HI : We encountered the following two problems when using omniscidb:

1、We want to limit CPU memory usage and set cpu-buffer-mem-bytes=37580963840 (35G). By executing omnisql>\memory_summary, it is shown that the CPU memory usage is successfully limited to below 35G. However, during use, it was found that the process RES occupies 154.5g, which far exceeds our limit. What data does the extra memory occupy? Is there a way to limit the overall memory usage of the process.

2、After we restart the device or process, when we load data from the disk to the memory for the first time, the disk IO read speed is only about 80M/s (SSD). This speed is much smaller than we expected. May I increase the data loading speed through parameter configuration or other methods.

a. Use version: omniscidb-5.5.2 b. Data volume: 500 million c. Single data size: 0.42k d. Disk usage: 203g e. cpu-buffer-mem-bytes=37580963840 (35G) f. \memory_summary view CPU memory usage: 34.6g g. top command to view the process memory usage: 154.5g

cdessanti commented 3 years ago

Hi @LiYade,

The cpu-buffer-mem-bytes parameter is intended to limit of memory allocated for caching data into the CPU-cache subsystem. But, as you noticed, it's not going to limit the overall memory consumption of the process. In fact, if needed, more memory will, be allocated to perform steps needed by the query (A big projection of data, sorts, and so on), but just for the duration of the query. Anyway, the process RSS could be misleading in the case of heavily multi-threaded software like Omnisci. Do you get those figures regularly or just while running queries?

We are far from perfect while loading the data from disk to CPU or GPU caches, but your IO looks painfully slow; I did some experiments with the num-threads parameter and got a minimum TP of 3.4GB/secs on NVMEs disks with a low number of threads, and topped at 5.2GB/secs with 48 Threads. You can check all the discussions here

If you want to preload some data at server startup, you can use the --db-query-list parameter; the server startup will be slower, but the users won't have to wait for the disk reads on their queries. You can check some examples of the parameter usage here

Looking at your statistics, you look constrained on CPU buffer memory; have you tried to increase it?

How may core have you into the server are you using? maybe it should be useful to change some Tables parameters to increase the performance

Regards, Candido

LiYade commented 3 years ago

Hi Candido:

Thank you very much for your answer:

The above data is what we observed during the query and after a period of steady use. We want to control the memory limit of each process in the server. We will consider adding memory bars to solve the memory limit at the end. We used omnisql to do some interactive testing with omnisciDB.

We were concerned that, in addition to the large CPU footprint of the data cache, it turned out that dictionary would consume a lot of memory for non-fixed-length fields.In our actual application, we have many varchar fields, and these fields are basically of fixed length, so:

  1. Can we implement a process memory limit by turning off dictionary?Does the (TEXT ENCODING NONE) type prevent fields from being aggregated?
  2. What is the realization principle and function of StringDictionary?

Regards, LiYade

cdessanti commented 3 years ago

Hi @LiYade,

Usually, the dictionary encoding help in saving memory, especially in scenarios where you have quite large strings with a low/medium cardinality.

e.g. You have a customer table, with 200 million customers from all over the world; the column containing the city, assuming 30000 cities and 15 bytes per city_name, would take without dictionary encoding 200M15 bytes, so 3GB, but with dictionary encoding, you would need 200Msize of the index (in this case 2bytes) and the size of the dictionary, so 400MB for the table data and 450K for the dictionary. In this case, you are saving 2.6GB of RAM (and disk) and a lot of CPU/GPU bandwidth.

Of course, the bigger the strings and lower the cardinality, the bigger the saves, but you can consume more memory if you have small strings with extremely high cardinality, you can consume more memory. e.g., 1 Billion different strings with a string size of 6 bytes and 200M of records are going to by 200M6, so 1,2GB without encoding and 200M6 in the table plus a 1B*(6+2) in the dictionary.

Is this your case? Have you tried to run the optimize for the table within omnisql

It's the \o switch that would suggest an optimized DDL for the table, so if you are using an encoding of 32 bits but you have just 46 different strings, you will be suggested with an 8-bit dictionary that will save 3 bytes per row in the database.

1) You can't turn off the dictionary, but you can use TEXT ENCODING NONE, but as you said, you won't be able to aggregate (or calculate distinct values) from the text encoding none field

2) the string dictionary creates a dictionary with the distinct values so that it will take the size of the distinct values of the strings plus an overhead of two bytes for each distinct string. The index value will substitute the text in the table. According to what was specified during the table creation, this value can be 1 or 2 or 4 bytes (8,16, or 32bits) When aggregation is run, the index value is used, and during the projection, it will be decoded to return the corresponding string.

Can you give us an example of your fixed lengths fields? (length, cardinality etc.)

Regards, Candido