pmem / CacheLib

Pluggable in-process caching engine to build and scale high performance services
https://www.cachelib.org
Apache License 2.0
5 stars 13 forks source link

Added per tier pool class rolling average latency #96

Closed guptask closed 2 years ago

guptask commented 2 years ago

This change is Reviewable

guptask commented 2 years ago

cachelib/common/RollingStats.h line 1 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…
"Also, have you measured how much performance impact this latency tracking have? Could you please add an option to config to enable/disable gathering this latency info? Ans: I'm working on a compile type flag option. " Will you add this option to this PR?

I'm going to measure the performance impact first. If the impact is negligible, I'm in favor of not making it a compile time option. I'm waiting for my machine to be free; it is under use for background eviction data collection.

guptask commented 2 years ago

cachelib/common/RollingStats.h line 40 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…
I think it would be better to return double here. I know you need uint64_t later but this class might also be used for different purposes where double might be more appropriate

ok. I'll update it to double

guptask commented 2 years ago

cachelib/common/RollingStats.h line 71 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…
didn't you say those are unncessary? (i.e. can't we delete move ctor and assignment operator?)

I mentioned that about RollingStats. I have taken those out. I kept the assignment operator for RollingLatencyTracker based on comparison to the LatencyTracker (used by PercentileStats). I don't see it being used currently.

guptask commented 2 years ago

Command used :

./bin/cachebench --json_test_config test_configs/hit_ratio/graph_cache_leader_fbobj/config-8GB-DRAM.json
     -report_memory_usage_stats -progress 1 -progress_stats_file <per_sec_stats_txt> &> <overall_stats_txt>

The data on the right is with RollingLatencyTracker commented out. The rolling latency print statement has an incorrect % sign at end. I'll remove it.

== Test Results ==                                              == Test Results ==
== Allocator Stats ==                                           == Allocator Stats ==
Items in RAM  : 17,016,761                                    | Items in RAM  : 17,036,539
Items in NVM  : 0                                               Items in NVM  : 0
Alloc Attempts: 68,161,566 Success: 100.00%                   | Alloc Attempts: 68,117,977 Success: 100.00%
RAM Evictions : 51,144,741                                    | RAM Evictions : 51,081,395
tid 0 free slabs : 0.00%                                        tid 0 free slabs : 0.00%
tid 0 pid 0 cid   0    64.00B memorySize:     0.00B             tid 0 pid 0 cid   0    64.00B memorySize:     0.00B
tid 0 pid 0 cid   1    96.00B memorySize:    76.00MB            tid 0 pid 0 cid   1    96.00B memorySize:    76.00MB
tid 0 pid 0 cid   2   144.00B memorySize:   248.00MB            tid 0 pid 0 cid   2   144.00B memorySize:   248.00MB
tid 0 pid 0 cid   3   216.00B memorySize:   544.00MB            tid 0 pid 0 cid   3   216.00B memorySize:   544.00MB
tid 0 pid 0 cid   4   328.00B memorySize:     1.47GB            tid 0 pid 0 cid   4   328.00B memorySize:     1.47GB
tid 0 pid 0 cid   5   496.00B memorySize:     1.93GB            tid 0 pid 0 cid   5   496.00B memorySize:     1.93GB
tid 0 pid 0 cid   6   744.00B memorySize:  1016.00MB            tid 0 pid 0 cid   6   744.00B memorySize:  1016.00MB
tid 0 pid 0 cid   7     1.09KB memorySize:   836.00MB         | tid 0 pid 0 cid   7     1.09KB memorySize:   840.00MB
tid 0 pid 0 cid   8     1.64KB memorySize:   416.00MB         | tid 0 pid 0 cid   8     1.64KB memorySize:   420.00MB
tid 0 pid 0 cid   9     2.46KB memorySize:   268.00MB         | tid 0 pid 0 cid   9     2.46KB memorySize:   272.00MB
tid 0 pid 0 cid  10     3.70KB memorySize:   304.00MB           tid 0 pid 0 cid  10     3.70KB memorySize:   304.00MB
tid 0 pid 0 cid  11     5.55KB memorySize:   268.00MB         | tid 0 pid 0 cid  11     5.55KB memorySize:   264.00MB
tid 0 pid 0 cid  12     8.32KB memorySize:    68.00MB           tid 0 pid 0 cid  12     8.32KB memorySize:    68.00MB
tid 0 pid 0 cid  13    12.48KB memorySize:    36.00MB           tid 0 pid 0 cid  13    12.48KB memorySize:    36.00MB
tid 0 pid 0 cid  14    18.78KB memorySize:    44.00MB           tid 0 pid 0 cid  14    18.78KB memorySize:    44.00MB
tid 0 pid 0 cid  15    28.24KB memorySize:    60.00MB         | tid 0 pid 0 cid  15    28.24KB memorySize:    64.00MB
tid 0 pid 0 cid  16    42.66KB memorySize:   412.00MB         | tid 0 pid 0 cid  16    42.66KB memorySize:   404.00MB
tid 0 pid 0 cid  17    64.00KB memorySize:    72.00MB         | tid 0 pid 0 cid  17    64.00KB memorySize:    64.00MB
tid 0 pid 0 cid  18    97.52KB memorySize:    28.00MB           tid 0 pid 0 cid  18    97.52KB memorySize:    28.00MB
tid 0 pid 0 cid  19   151.70KB memorySize:     8.00MB           tid 0 pid 0 cid  19   151.70KB memorySize:     8.00MB
tid 0 pid 0 cid  20   227.55KB memorySize:     0.00B            tid 0 pid 0 cid  20   227.55KB memorySize:     0.00B
tid 0 pid 0 cid  21   372.36KB memorySize:     0.00B            tid 0 pid 0 cid  21   372.36KB memorySize:     0.00B
tid 0 pid 0 cid  22   585.14KB memorySize:     0.00B            tid 0 pid 0 cid  22   585.14KB memorySize:     0.00B
tid 0 pid 0 cid  23     1.00MB memorySize:     0.00B            tid 0 pid 0 cid  23     1.00MB memorySize:     0.00B
tid 0 pid 0 cid   0    64.00B free: 100.00%                     tid 0 pid 0 cid   0    64.00B free: 100.00%
tid 0 pid 0 cid   1    96.00B free: 0.00%                       tid 0 pid 0 cid   1    96.00B free: 0.00%
tid 0 pid 0 cid   2   144.00B free: 0.00%                       tid 0 pid 0 cid   2   144.00B free: 0.00%
tid 0 pid 0 cid   3   216.00B free: 0.00%                       tid 0 pid 0 cid   3   216.00B free: 0.00%
tid 0 pid 0 cid   4   328.00B free: 0.00%                       tid 0 pid 0 cid   4   328.00B free: 0.00%
tid 0 pid 0 cid   5   496.00B free: 0.00%                       tid 0 pid 0 cid   5   496.00B free: 0.00%
tid 0 pid 0 cid   6   744.00B free: 0.01%                       tid 0 pid 0 cid   6   744.00B free: 0.01%
tid 0 pid 0 cid   7     1.09KB free: 0.02%                      tid 0 pid 0 cid   7     1.09KB free: 0.02%
tid 0 pid 0 cid   8     1.64KB free: 0.02%                      tid 0 pid 0 cid   8     1.64KB free: 0.02%
tid 0 pid 0 cid   9     2.46KB free: 0.02%                      tid 0 pid 0 cid   9     2.46KB free: 0.02%
tid 0 pid 0 cid  10     3.70KB free: 0.04%                      tid 0 pid 0 cid  10     3.70KB free: 0.04%
tid 0 pid 0 cid  11     5.55KB free: 0.06%                      tid 0 pid 0 cid  11     5.55KB free: 0.06%
tid 0 pid 0 cid  12     8.32KB free: 0.06%                      tid 0 pid 0 cid  12     8.32KB free: 0.06%
tid 0 pid 0 cid  13    12.48KB free: 0.03%                      tid 0 pid 0 cid  13    12.48KB free: 0.03%
tid 0 pid 0 cid  14    18.78KB free: 0.04%                      tid 0 pid 0 cid  14    18.78KB free: 0.04%
tid 0 pid 0 cid  15    28.24KB free: 0.02%                      tid 0 pid 0 cid  15    28.24KB free: 0.02%
tid 0 pid 0 cid  16    42.66KB free: 0.01%                      tid 0 pid 0 cid  16    42.66KB free: 0.01%
tid 0 pid 0 cid  17    64.00KB free: 0.00%                      tid 0 pid 0 cid  17    64.00KB free: 0.00%
tid 0 pid 0 cid  18    97.52KB free: 0.00%                      tid 0 pid 0 cid  18    97.52KB free: 0.00%
tid 0 pid 0 cid  19   151.70KB free: 0.00%                      tid 0 pid 0 cid  19   151.70KB free: 0.00%
tid 0 pid 0 cid  20   227.55KB free: 100.00%                    tid 0 pid 0 cid  20   227.55KB free: 100.00%
tid 0 pid 0 cid  21   372.36KB free: 100.00%                    tid 0 pid 0 cid  21   372.36KB free: 100.00%
tid 0 pid 0 cid  22   585.14KB free: 100.00%                    tid 0 pid 0 cid  22   585.14KB free: 100.00%
tid 0 pid 0 cid  23     1.00MB free: 100.00%                    tid 0 pid 0 cid  23     1.00MB free: 100.00%
tid 0 pid 0 cid   0    64.00B latency:     0.00%                tid 0 pid 0 cid   0    64.00B latency:     0.00%
tid 0 pid 0 cid   1    96.00B latency: 11586.37%              | tid 0 pid 0 cid   1    96.00B latency:     0.00%
tid 0 pid 0 cid   2   144.00B latency: 12463.21%              | tid 0 pid 0 cid   2   144.00B latency:     0.00%
tid 0 pid 0 cid   3   216.00B latency: 13428.45%              | tid 0 pid 0 cid   3   216.00B latency:     0.00%
tid 0 pid 0 cid   4   328.00B latency: 19003.28%              | tid 0 pid 0 cid   4   328.00B latency:     0.00%
tid 0 pid 0 cid   5   496.00B latency: 16499.70%              | tid 0 pid 0 cid   5   496.00B latency:     0.00%
tid 0 pid 0 cid   6   744.00B latency: 12304.09%              | tid 0 pid 0 cid   6   744.00B latency:     0.00%
tid 0 pid 0 cid   7     1.09KB latency: 11744.12%             | tid 0 pid 0 cid   7     1.09KB latency:     0.00%
tid 0 pid 0 cid   8     1.64KB latency: 11291.19%             | tid 0 pid 0 cid   8     1.64KB latency:     0.00%
tid 0 pid 0 cid   9     2.46KB latency: 11218.06%             | tid 0 pid 0 cid   9     2.46KB latency:     0.00%
tid 0 pid 0 cid  10     3.70KB latency: 11231.27%             | tid 0 pid 0 cid  10     3.70KB latency:     0.00%
tid 0 pid 0 cid  11     5.55KB latency: 11290.93%             | tid 0 pid 0 cid  11     5.55KB latency:     0.00%
tid 0 pid 0 cid  12     8.32KB latency: 11351.12%             | tid 0 pid 0 cid  12     8.32KB latency:     0.00%
tid 0 pid 0 cid  13    12.48KB latency: 11304.93%             | tid 0 pid 0 cid  13    12.48KB latency:     0.00%
tid 0 pid 0 cid  14    18.78KB latency: 11432.36%             | tid 0 pid 0 cid  14    18.78KB latency:     0.00%
tid 0 pid 0 cid  15    28.24KB latency: 11395.97%             | tid 0 pid 0 cid  15    28.24KB latency:     0.00%
tid 0 pid 0 cid  16    42.66KB latency: 11821.42%             | tid 0 pid 0 cid  16    42.66KB latency:     0.00%
tid 0 pid 0 cid  17    64.00KB latency: 11578.09%             | tid 0 pid 0 cid  17    64.00KB latency:     0.00%
tid 0 pid 0 cid  18    97.52KB latency: 11896.95%             | tid 0 pid 0 cid  18    97.52KB latency:     0.00%
tid 0 pid 0 cid  19   151.70KB latency:  8816.12%             | tid 0 pid 0 cid  19   151.70KB latency:     0.00%
tid 0 pid 0 cid  20   227.55KB latency: 23360.75%             | tid 0 pid 0 cid  20   227.55KB latency:     0.00%
tid 0 pid 0 cid  21   372.36KB latency:     0.00%               tid 0 pid 0 cid  21   372.36KB latency:     0.00%
tid 0 pid 0 cid  22   585.14KB latency:     0.00%               tid 0 pid 0 cid  22   585.14KB latency:     0.00%
tid 0 pid 0 cid  23     1.00MB latency:     0.00%               tid 0 pid 0 cid  23     1.00MB latency:     0.00%
Cache Gets    : 120,000,000                                     Cache Gets    : 120,000,000
Hit Ratio     :  43.20%                                       | Hit Ratio     :  43.24%
NVM Gets      :               0, Coalesced : 100.00%            NVM Gets      :               0, Coalesced : 100.00%
NVM Puts      :               0, Success   :   0.00%, Clean     NVM Puts      :               0, Success   :   0.00%, Clean  
NVM Evicts    :               0, Clean     : 100.00%, Unclean   NVM Evicts    :               0, Clean     : 100.00%, Unclean
NVM Deletes   :               0 Skipped Deletes: 100.00%        NVM Deletes   :               0 Skipped Deletes: 100.00%

== Throughput for  ==                                           == Throughput for  ==
Total Ops : 120.00 million                                      Total Ops : 120.00 million
Total sets: 68,161,566                                        | Total sets: 68,117,977
get       :   926,544/s, success   :  43.20%                  | get       :   934,599/s, success   :  43.24%
set       :   526,289/s, success   : 100.00%                  | set       :   530,525/s, success   : 100.00%
del       :         0/s, found     :   0.00%                    del       :         0/s, found     :   0.00%
guptask commented 2 years ago

cachelib/common/RollingStats.h line 1 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote… I'm going to measure the performance impact first. If the impact is negligible, I'm in favor of not making it a compile time option. I'm waiting for my machine to be free; it is under use for background eviction data collection.

Please refer to the results shared in a comment below. Based on these results, I think the performance footprint of rolling stats is quite lot.

guptask commented 2 years ago

cachelib/cachebench/cache/CacheStats.h line 162 at r3 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…
Please extend the description - what latency is being reported?

changed it to rollingAvgLatency

guptask commented 2 years ago

cachelib/common/RollingStats.h line 1 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…
Can you run it a few times and average the results? The difference is small enough to be just a run-to-run variance

I ran it 3 times. The difference is small enough to be considered a runtime variance.

guptask commented 2 years ago

cachelib/common/RollingStats.h line 35 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…
Can you elaborate? I don;t understand "cnt_ is getting in". Even if you increase cnt_ you can cause it overflow (when it's equal to std::numeric_limits::max()) to 0.

Sorry I accidentally pressed send before finishing the sentence. I added the full sentence in another comment which is not showing up in the Reviewable mode. I wrote: "cnt is getting incremented before this line. cnt is currently uint64_t. Only scenario is can be 0 is if the value rolls around from 2^64-1."

You're talking about the same thing. So I added a clause that stops calculating rolling latency once cnt_ reaches numerical limit.

guptask commented 2 years ago

cachelib/cachebench/cache/CacheStats.h line 162 at r3 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…
I meant "latency of what"? You should specify that it's find latency (unless it is already displayed somewhere?)

Done.

guptask commented 2 years ago

cachelib/common/RollingStats.h line 35 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…
I guess you could just start from 0 in that case. Instead of stopping, just set cnt_ to 0 and move on. You don't need this idel variable in that case as well.

done