Closed guptask closed 2 years ago
cachelib/common/RollingStats.h
line 1 at r1 (raw file):
"Also, have you measured how much performance impact this latency tracking have? Could you please add an option to config to enable/disable gathering this latency info? Ans: I'm working on a compile type flag option. " Will you add this option to this PR?
I'm going to measure the performance impact first. If the impact is negligible, I'm in favor of not making it a compile time option. I'm waiting for my machine to be free; it is under use for background eviction data collection.
cachelib/common/RollingStats.h
line 40 at r1 (raw file):
I think it would be better to return double here. I know you need uint64_t later but this class might also be used for different purposes where double might be more appropriate
ok. I'll update it to double
cachelib/common/RollingStats.h
line 71 at r1 (raw file):
didn't you say those are unncessary? (i.e. can't we delete move ctor and assignment operator?)
I mentioned that about RollingStats. I have taken those out. I kept the assignment operator for RollingLatencyTracker based on comparison to the LatencyTracker (used by PercentileStats). I don't see it being used currently.
Command used :
./bin/cachebench --json_test_config test_configs/hit_ratio/graph_cache_leader_fbobj/config-8GB-DRAM.json
-report_memory_usage_stats -progress 1 -progress_stats_file <per_sec_stats_txt> &> <overall_stats_txt>
The data on the right is with RollingLatencyTracker
commented out.
The rolling latency print statement has an incorrect % sign at end. I'll remove it.
== Test Results == == Test Results ==
== Allocator Stats == == Allocator Stats ==
Items in RAM : 17,016,761 | Items in RAM : 17,036,539
Items in NVM : 0 Items in NVM : 0
Alloc Attempts: 68,161,566 Success: 100.00% | Alloc Attempts: 68,117,977 Success: 100.00%
RAM Evictions : 51,144,741 | RAM Evictions : 51,081,395
tid 0 free slabs : 0.00% tid 0 free slabs : 0.00%
tid 0 pid 0 cid 0 64.00B memorySize: 0.00B tid 0 pid 0 cid 0 64.00B memorySize: 0.00B
tid 0 pid 0 cid 1 96.00B memorySize: 76.00MB tid 0 pid 0 cid 1 96.00B memorySize: 76.00MB
tid 0 pid 0 cid 2 144.00B memorySize: 248.00MB tid 0 pid 0 cid 2 144.00B memorySize: 248.00MB
tid 0 pid 0 cid 3 216.00B memorySize: 544.00MB tid 0 pid 0 cid 3 216.00B memorySize: 544.00MB
tid 0 pid 0 cid 4 328.00B memorySize: 1.47GB tid 0 pid 0 cid 4 328.00B memorySize: 1.47GB
tid 0 pid 0 cid 5 496.00B memorySize: 1.93GB tid 0 pid 0 cid 5 496.00B memorySize: 1.93GB
tid 0 pid 0 cid 6 744.00B memorySize: 1016.00MB tid 0 pid 0 cid 6 744.00B memorySize: 1016.00MB
tid 0 pid 0 cid 7 1.09KB memorySize: 836.00MB | tid 0 pid 0 cid 7 1.09KB memorySize: 840.00MB
tid 0 pid 0 cid 8 1.64KB memorySize: 416.00MB | tid 0 pid 0 cid 8 1.64KB memorySize: 420.00MB
tid 0 pid 0 cid 9 2.46KB memorySize: 268.00MB | tid 0 pid 0 cid 9 2.46KB memorySize: 272.00MB
tid 0 pid 0 cid 10 3.70KB memorySize: 304.00MB tid 0 pid 0 cid 10 3.70KB memorySize: 304.00MB
tid 0 pid 0 cid 11 5.55KB memorySize: 268.00MB | tid 0 pid 0 cid 11 5.55KB memorySize: 264.00MB
tid 0 pid 0 cid 12 8.32KB memorySize: 68.00MB tid 0 pid 0 cid 12 8.32KB memorySize: 68.00MB
tid 0 pid 0 cid 13 12.48KB memorySize: 36.00MB tid 0 pid 0 cid 13 12.48KB memorySize: 36.00MB
tid 0 pid 0 cid 14 18.78KB memorySize: 44.00MB tid 0 pid 0 cid 14 18.78KB memorySize: 44.00MB
tid 0 pid 0 cid 15 28.24KB memorySize: 60.00MB | tid 0 pid 0 cid 15 28.24KB memorySize: 64.00MB
tid 0 pid 0 cid 16 42.66KB memorySize: 412.00MB | tid 0 pid 0 cid 16 42.66KB memorySize: 404.00MB
tid 0 pid 0 cid 17 64.00KB memorySize: 72.00MB | tid 0 pid 0 cid 17 64.00KB memorySize: 64.00MB
tid 0 pid 0 cid 18 97.52KB memorySize: 28.00MB tid 0 pid 0 cid 18 97.52KB memorySize: 28.00MB
tid 0 pid 0 cid 19 151.70KB memorySize: 8.00MB tid 0 pid 0 cid 19 151.70KB memorySize: 8.00MB
tid 0 pid 0 cid 20 227.55KB memorySize: 0.00B tid 0 pid 0 cid 20 227.55KB memorySize: 0.00B
tid 0 pid 0 cid 21 372.36KB memorySize: 0.00B tid 0 pid 0 cid 21 372.36KB memorySize: 0.00B
tid 0 pid 0 cid 22 585.14KB memorySize: 0.00B tid 0 pid 0 cid 22 585.14KB memorySize: 0.00B
tid 0 pid 0 cid 23 1.00MB memorySize: 0.00B tid 0 pid 0 cid 23 1.00MB memorySize: 0.00B
tid 0 pid 0 cid 0 64.00B free: 100.00% tid 0 pid 0 cid 0 64.00B free: 100.00%
tid 0 pid 0 cid 1 96.00B free: 0.00% tid 0 pid 0 cid 1 96.00B free: 0.00%
tid 0 pid 0 cid 2 144.00B free: 0.00% tid 0 pid 0 cid 2 144.00B free: 0.00%
tid 0 pid 0 cid 3 216.00B free: 0.00% tid 0 pid 0 cid 3 216.00B free: 0.00%
tid 0 pid 0 cid 4 328.00B free: 0.00% tid 0 pid 0 cid 4 328.00B free: 0.00%
tid 0 pid 0 cid 5 496.00B free: 0.00% tid 0 pid 0 cid 5 496.00B free: 0.00%
tid 0 pid 0 cid 6 744.00B free: 0.01% tid 0 pid 0 cid 6 744.00B free: 0.01%
tid 0 pid 0 cid 7 1.09KB free: 0.02% tid 0 pid 0 cid 7 1.09KB free: 0.02%
tid 0 pid 0 cid 8 1.64KB free: 0.02% tid 0 pid 0 cid 8 1.64KB free: 0.02%
tid 0 pid 0 cid 9 2.46KB free: 0.02% tid 0 pid 0 cid 9 2.46KB free: 0.02%
tid 0 pid 0 cid 10 3.70KB free: 0.04% tid 0 pid 0 cid 10 3.70KB free: 0.04%
tid 0 pid 0 cid 11 5.55KB free: 0.06% tid 0 pid 0 cid 11 5.55KB free: 0.06%
tid 0 pid 0 cid 12 8.32KB free: 0.06% tid 0 pid 0 cid 12 8.32KB free: 0.06%
tid 0 pid 0 cid 13 12.48KB free: 0.03% tid 0 pid 0 cid 13 12.48KB free: 0.03%
tid 0 pid 0 cid 14 18.78KB free: 0.04% tid 0 pid 0 cid 14 18.78KB free: 0.04%
tid 0 pid 0 cid 15 28.24KB free: 0.02% tid 0 pid 0 cid 15 28.24KB free: 0.02%
tid 0 pid 0 cid 16 42.66KB free: 0.01% tid 0 pid 0 cid 16 42.66KB free: 0.01%
tid 0 pid 0 cid 17 64.00KB free: 0.00% tid 0 pid 0 cid 17 64.00KB free: 0.00%
tid 0 pid 0 cid 18 97.52KB free: 0.00% tid 0 pid 0 cid 18 97.52KB free: 0.00%
tid 0 pid 0 cid 19 151.70KB free: 0.00% tid 0 pid 0 cid 19 151.70KB free: 0.00%
tid 0 pid 0 cid 20 227.55KB free: 100.00% tid 0 pid 0 cid 20 227.55KB free: 100.00%
tid 0 pid 0 cid 21 372.36KB free: 100.00% tid 0 pid 0 cid 21 372.36KB free: 100.00%
tid 0 pid 0 cid 22 585.14KB free: 100.00% tid 0 pid 0 cid 22 585.14KB free: 100.00%
tid 0 pid 0 cid 23 1.00MB free: 100.00% tid 0 pid 0 cid 23 1.00MB free: 100.00%
tid 0 pid 0 cid 0 64.00B latency: 0.00% tid 0 pid 0 cid 0 64.00B latency: 0.00%
tid 0 pid 0 cid 1 96.00B latency: 11586.37% | tid 0 pid 0 cid 1 96.00B latency: 0.00%
tid 0 pid 0 cid 2 144.00B latency: 12463.21% | tid 0 pid 0 cid 2 144.00B latency: 0.00%
tid 0 pid 0 cid 3 216.00B latency: 13428.45% | tid 0 pid 0 cid 3 216.00B latency: 0.00%
tid 0 pid 0 cid 4 328.00B latency: 19003.28% | tid 0 pid 0 cid 4 328.00B latency: 0.00%
tid 0 pid 0 cid 5 496.00B latency: 16499.70% | tid 0 pid 0 cid 5 496.00B latency: 0.00%
tid 0 pid 0 cid 6 744.00B latency: 12304.09% | tid 0 pid 0 cid 6 744.00B latency: 0.00%
tid 0 pid 0 cid 7 1.09KB latency: 11744.12% | tid 0 pid 0 cid 7 1.09KB latency: 0.00%
tid 0 pid 0 cid 8 1.64KB latency: 11291.19% | tid 0 pid 0 cid 8 1.64KB latency: 0.00%
tid 0 pid 0 cid 9 2.46KB latency: 11218.06% | tid 0 pid 0 cid 9 2.46KB latency: 0.00%
tid 0 pid 0 cid 10 3.70KB latency: 11231.27% | tid 0 pid 0 cid 10 3.70KB latency: 0.00%
tid 0 pid 0 cid 11 5.55KB latency: 11290.93% | tid 0 pid 0 cid 11 5.55KB latency: 0.00%
tid 0 pid 0 cid 12 8.32KB latency: 11351.12% | tid 0 pid 0 cid 12 8.32KB latency: 0.00%
tid 0 pid 0 cid 13 12.48KB latency: 11304.93% | tid 0 pid 0 cid 13 12.48KB latency: 0.00%
tid 0 pid 0 cid 14 18.78KB latency: 11432.36% | tid 0 pid 0 cid 14 18.78KB latency: 0.00%
tid 0 pid 0 cid 15 28.24KB latency: 11395.97% | tid 0 pid 0 cid 15 28.24KB latency: 0.00%
tid 0 pid 0 cid 16 42.66KB latency: 11821.42% | tid 0 pid 0 cid 16 42.66KB latency: 0.00%
tid 0 pid 0 cid 17 64.00KB latency: 11578.09% | tid 0 pid 0 cid 17 64.00KB latency: 0.00%
tid 0 pid 0 cid 18 97.52KB latency: 11896.95% | tid 0 pid 0 cid 18 97.52KB latency: 0.00%
tid 0 pid 0 cid 19 151.70KB latency: 8816.12% | tid 0 pid 0 cid 19 151.70KB latency: 0.00%
tid 0 pid 0 cid 20 227.55KB latency: 23360.75% | tid 0 pid 0 cid 20 227.55KB latency: 0.00%
tid 0 pid 0 cid 21 372.36KB latency: 0.00% tid 0 pid 0 cid 21 372.36KB latency: 0.00%
tid 0 pid 0 cid 22 585.14KB latency: 0.00% tid 0 pid 0 cid 22 585.14KB latency: 0.00%
tid 0 pid 0 cid 23 1.00MB latency: 0.00% tid 0 pid 0 cid 23 1.00MB latency: 0.00%
Cache Gets : 120,000,000 Cache Gets : 120,000,000
Hit Ratio : 43.20% | Hit Ratio : 43.24%
NVM Gets : 0, Coalesced : 100.00% NVM Gets : 0, Coalesced : 100.00%
NVM Puts : 0, Success : 0.00%, Clean NVM Puts : 0, Success : 0.00%, Clean
NVM Evicts : 0, Clean : 100.00%, Unclean NVM Evicts : 0, Clean : 100.00%, Unclean
NVM Deletes : 0 Skipped Deletes: 100.00% NVM Deletes : 0 Skipped Deletes: 100.00%
== Throughput for == == Throughput for ==
Total Ops : 120.00 million Total Ops : 120.00 million
Total sets: 68,161,566 | Total sets: 68,117,977
get : 926,544/s, success : 43.20% | get : 934,599/s, success : 43.24%
set : 526,289/s, success : 100.00% | set : 530,525/s, success : 100.00%
del : 0/s, found : 0.00% del : 0/s, found : 0.00%
cachelib/common/RollingStats.h
line 1 at r1 (raw file):Previously, igchor (Igor Chorążewicz) wrote… I'm going to measure the performance impact first. If the impact is negligible, I'm in favor of not making it a compile time option. I'm waiting for my machine to be free; it is under use for background eviction data collection.
Please refer to the results shared in a comment below. Based on these results, I think the performance footprint of rolling stats is quite lot.
cachelib/cachebench/cache/CacheStats.h
line 162 at r3 (raw file):
Please extend the description - what latency is being reported?
changed it to rollingAvgLatency
cachelib/common/RollingStats.h
line 1 at r1 (raw file):
Can you run it a few times and average the results? The difference is small enough to be just a run-to-run variance
I ran it 3 times. The difference is small enough to be considered a runtime variance.
cachelib/common/RollingStats.h
line 35 at r1 (raw file):
Can you elaborate? I don;t understand "cnt_ is getting in". Even if you increase cnt_ you can cause it overflow (when it's equal to std::numeric_limits::max()) to 0.
Sorry I accidentally pressed send before finishing the sentence. I added the full sentence in another comment which is not showing up in the Reviewable mode. I wrote: "cnt is getting incremented before this line. cnt is currently uint64_t. Only scenario is can be 0 is if the value rolls around from 2^64-1."
You're talking about the same thing. So I added a clause that stops calculating rolling latency once cnt_ reaches numerical limit.
cachelib/cachebench/cache/CacheStats.h
line 162 at r3 (raw file):
I meant "latency of what"? You should specify that it's find latency (unless it is already displayed somewhere?)
Done.
cachelib/common/RollingStats.h
line 35 at r1 (raw file):
I guess you could just start from 0 in that case. Instead of stopping, just set cnt_ to 0 and move on. You don't need this idel variable in that case as well.
done
This change is