Closed kostya-sh closed 8 years ago
The 4GB memory usage is consistent with what I observed here, yes. Haven't looked for a way to control that yet.
The difference could be due to difference in single-threaded CPU performance. Which CPU is that?
Can you try running wrk w/o a limit on the requests/s to see how many RPS it can serve on your machine? If the number is too close to 10K you may be observing throughput-caused latency increases.
edit: Also, many of the apps have a worse initial latency (while growing the hash table) compared to the steady-state latency (where items are both added and removed), so that could be a factor.
4GB is roughly 16 times the size of the live data set of the application (250Mb). For comparison Go version uses 580Mb, Haskell version uses 810Mb. Maybe there is a bug in the OCaml version of the app? Or some memory hungry data structure is being used?
Unfortunately my laptop has only 4GB (time to upgrade? ;) so I cannot run the original test. However when I reduce the maximum map size to 125,000 I get the following results (wrk2 --latency -c 99 -t 3 -d 300 -R9000 'http://localhost:8080'
):
Latency Distribution (HdrHistogram - Recorded Latency)
50.000% 1.81ms
75.000% 2.51ms
90.000% 3.25ms
99.000% 6.18ms
99.900% 13.85ms
99.990% 23.89ms
99.999% 31.53ms
100.000% 37.47ms
Reducing number of requests per second to 4000 gives:
50.000% 1.51ms
75.000% 2.07ms
90.000% 2.59ms
99.000% 4.22ms
99.900% 7.73ms
99.990% 21.07ms
99.999% 32.08ms
100.000% 34.85ms
Finally this is comparison of Go (250k), Haskell (250k) and OCaml (125k) implementations that I get on my machine (Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz):
I am aware that this is not very valid comparison but I probably already spent more time analyzing this benchmark that I should have ;)
I found the issue with the memory usage. Turns out, OCaml encodes all data types as machine-words, which are 64 bits on most modern machines. This means the program has been creating 8KB buffers instead of 1KB the whole time.
Switching from Array
of 1K chars to Buffer
resulted with a significant drop of memory usage. You should be able to reproduce the benchmark now.
I used the following commands to build the ocaml-reason test program:
The resulting program however uses 4GB of memory during warmup. Is it expected? How much RAM is required to run the test program?
I cannot run the full test on my laptop but here is the output from
wrk2 --latency -c 99 -t 3 -d 30 -R10000 'http://localhost:8080'
: https://gist.github.com/kostya-sh/f5a925b139a55517ccd2b72eb1544469.OS: Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux CPU: 4 cores