Open pohaoc opened 1 month ago
Hi pohanc, Thank you for your interest.
From the information you shared, I think you are actually doing right. A quick answer to your question would be:
# Launching the daemon and the application as you have already done...
# Then generating memory pressure on the system. You can launch another application to consume available memory, or leverage the built-in script to simulate memory pressure by setting a hard limit on how much memory Midas can harvest. Here is a concrete example:
./scripts/set_memory_limit.sh <memory limit in MB, e.g., 1024, which is 1024MB>
You can verify the memory usage with tools like top
or htop
, or check the delta of stats by reading the /proc/meminfo
file. If you are using top
or htop
, you can also pay attention to the SHR
memory usage of the client application, which stands for the amount of soft memory it is granted.
Throughput may or may not be a good indicator to check if memory reclamation happens, because Midas reclaims cold memory first and application throughput may not drop until hot objects get reclaimed. For the synthetic
application, limiting its soft memory budget to a small value (1GB) should generate a visible throughput drop.
Detailed explanation to some other questions in case you are interested:
kInfo
or even lower for more detailed logging information. The client will connect to the daemon during its initialization, so as long as it runs, it is registered to the daemon.koord
will achieve the best reclamation throughput, koord
does depend on the Linux to compile which makes it a little bit hard to set up. In fact, the current Midas daemon is able to run purely in the userspace and it communicates with clients via userspace-controlled shared memory. We choose to open source this version first for easier adoption. So feel free to skip setting up koord
on Linux 5.0 and run the daemon directly. Technically, this daemon should still work.I hope this is helpful. Feel free to follow up if you have additional questions.
Best, Yifan
Thanks for the clarification! As a follow-up, for an array that takes N bytes and to make K% soft state,
I should set the cache size (i.e., pool->update_limit(K% * N)
). If I want to make sure K% of the array are reclaimed, can I trigger this by limiting the memory to (100-K)% of the array?
I notice in the logs it reports "[Error] float-value 0". I am wondering what does this message mean?
Hi pohaoc,
Yes, your understanding of the setup is correct. Regarding the error, do you have the complete log and instructions to reproduce it?
I believe the error messages occur if /scripts/set_memory_limit.sh <mb>
is sufficiently low.
This can be reproduced using the synthetic benchmark in this repository (default setting and kCacheRatio = 0.2
) by setting the memory limit to 2000MB.
# in the daemon program
[Warning] Client 6058000000681641 is dead!
[Error] 1007.49 0
[Error] 1289.2 0
[Error] 1106.5 0
[Error] 16957.3 0
.....
What I would like to achieve is to define some amount of objects in an data structure (using pool->update_limit
) that can leverage soft states, and always force reconstruction by lowering the global memory limit to measure the throughput under worst case scenarios. Is this the right way to do it?
The 2000MB memory limit sounds large enough and should not run into any errors. I tried to reproduce the issue. On our machine (the same one as reported in the paper), 50MB memory is sufficient to run synthetic
smoothly (although the throughput is not as good due to the memory limit). Could you please provide more detailed information on all modifications to the program, full instructions you used, and log files so I can take a closer look? Thank you!
Regarding your question on enforcing reconstruction, I would suggest slightly modifying the program to bypass the cache and invoke reconstructions manually. With a cache, there is no guarantee that cache miss and reconstruction is always triggered.
Thanks for the reply! The program is umodified and this is tested with Ubuntu 18.04 w/ kernel version 5.0.0 and using xl170 instances on cloudlab (64GB of DRAM). Here are the full steps that I followed.
git clone https://github.com/uclasystem/midas.git
# with g++-9
./scripts/build.sh
./scripts/set_memory_limit.sh 200
./scripts/run_daemon.sh
cd apps/synthetic/
make -j
./synthetic
# From the Daemon
[Error] 26790.9 0
[Error] 52708.6 0
[Error] 62409.5 0
[Error] 64874.6 0
[Error] 58199.8 0
....
Although these messages show up in the daemon log, the main program itself seems to execute fine. Do you know how I can interpret these error messages?
Great. I think that means the program is running fine. I made some minor updates on branch lower-log-level so feel free to try it out. It contains a total of 2 commits and 3 LOC changes.
I double-checked the log and it turned out those should be debug information rather than error info. They were printed when the policy function ran and had no impact on program execution. I disabled them in this commit: https://github.com/uclasystem/midas/commit/b8a481eb08be2c298d67e48a037d88809a95efc7
Given that you are running the synthetic application with limited memory, it may take a long while for it to finish one profile iteration. I will suggest printing out real-time throughput (per 5 seconds) within Midas perf tools. This is also enabled in this lower-log-level
branch (see this commit for details: https://github.com/uclasystem/midas/commit/175d638409489e7bb3d79a61f5204f671e4bf619). With this patch, the synthetic app should print out a throughput value in KOPS per 5 seconds.
Please feel free to try it out and let me know if you have further questions.
Hi,
I am following the steps from the README to reproduce the benchmark result.
However, the daemon doesn't seem to register the client as no message is shown. I notice the application throughput is also constant regardless of the cache ratio. I assume this is because the daemon never signaled koord to unmap pages.
This was tested on Ubuntu 20.04 (Linux 5.4) and 22.04 (Linux 5.15). I also tried it on 18.04 (Linux 5.0) but I think the koord module won't compile with the older kernel version.
Appreciate any help with this! Thanks!