uclasystem / midas

Midas is a memory management system that efficiently and safely harvests idle memory for applications' soft state.
Apache License 2.0
9 stars 1 forks source link

How to get the client to register with daemon? #2

Open pohaoc opened 1 month ago

pohaoc commented 1 month ago

Hi,

I am following the steps from the README to reproduce the benchmark result.

./scripts/build.sh

# this part was excluded from the steps
cd ./koord; make -j; sudo ./setup

# start daemon
./scripts/run_daemon.sh

# finally, run an application
./apps/synthetic/synthetic 

However, the daemon doesn't seem to register the client as no message is shown. I notice the application throughput is also constant regardless of the cache ratio. I assume this is because the daemon never signaled koord to unmap pages.

This was tested on Ubuntu 20.04 (Linux 5.4) and 22.04 (Linux 5.15). I also tried it on 18.04 (Linux 5.0) but I think the koord module won't compile with the older kernel version.

Appreciate any help with this! Thanks!

ivanium commented 1 month ago

Hi pohanc, Thank you for your interest.

From the information you shared, I think you are actually doing right. A quick answer to your question would be:

# Launching the daemon and the application as you have already done...
# Then generating memory pressure on the system. You can launch another application to consume available memory, or leverage the built-in script to simulate memory pressure by setting a hard limit on how much memory Midas can harvest. Here is a concrete example:
./scripts/set_memory_limit.sh <memory limit in MB, e.g., 1024, which is 1024MB>

You can verify the memory usage with tools like top or htop, or check the delta of stats by reading the /proc/meminfo file. If you are using top or htop, you can also pay attention to the SHR memory usage of the client application, which stands for the amount of soft memory it is granted.

Throughput may or may not be a good indicator to check if memory reclamation happens, because Midas reclaims cold memory first and application throughput may not drop until hot objects get reclaimed. For the synthetic application, limiting its soft memory budget to a small value (1GB) should generate a visible throughput drop.

Detailed explanation to some other questions in case you are interested:

  1. The daemon may not print logging messages to the terminal now, because we set a higher logging verbosity level (https://github.com/uclasystem/midas/blob/main/inc/logging.hpp#L21). Feel free to change it to kInfo or even lower for more detailed logging information. The client will connect to the daemon during its initialization, so as long as it runs, it is registered to the daemon.
  2. While the daemon with koord will achieve the best reclamation throughput, koord does depend on the Linux to compile which makes it a little bit hard to set up. In fact, the current Midas daemon is able to run purely in the userspace and it communicates with clients via userspace-controlled shared memory. We choose to open source this version first for easier adoption. So feel free to skip setting up koord on Linux 5.0 and run the daemon directly. Technically, this daemon should still work.

I hope this is helpful. Feel free to follow up if you have additional questions.

Best, Yifan

pohaoc commented 1 month ago

Thanks for the clarification! As a follow-up, for an array that takes N bytes and to make K% soft state,

I should set the cache size (i.e., pool->update_limit(K% * N) ). If I want to make sure K% of the array are reclaimed, can I trigger this by limiting the memory to (100-K)% of the array?

I notice in the logs it reports "[Error] float-value 0". I am wondering what does this message mean?

ivanium commented 3 weeks ago

Hi pohaoc,

Yes, your understanding of the setup is correct. Regarding the error, do you have the complete log and instructions to reproduce it?

pohaoc commented 3 weeks ago

I believe the error messages occur if /scripts/set_memory_limit.sh <mb> is sufficiently low.

This can be reproduced using the synthetic benchmark in this repository (default setting and kCacheRatio = 0.2) by setting the memory limit to 2000MB.

# in the daemon program
[Warning] Client 6058000000681641 is dead!
[Error] 1007.49 0
[Error] 1289.2 0
[Error] 1106.5 0
[Error] 16957.3 0
.....  

What I would like to achieve is to define some amount of objects in an data structure (using pool->update_limit) that can leverage soft states, and always force reconstruction by lowering the global memory limit to measure the throughput under worst case scenarios. Is this the right way to do it?

ivanium commented 1 week ago

The 2000MB memory limit sounds large enough and should not run into any errors. I tried to reproduce the issue. On our machine (the same one as reported in the paper), 50MB memory is sufficient to run synthetic smoothly (although the throughput is not as good due to the memory limit). Could you please provide more detailed information on all modifications to the program, full instructions you used, and log files so I can take a closer look? Thank you!

Regarding your question on enforcing reconstruction, I would suggest slightly modifying the program to bypass the cache and invoke reconstructions manually. With a cache, there is no guarantee that cache miss and reconstruction is always triggered.

pohaoc commented 1 week ago

Thanks for the reply! The program is umodified and this is tested with Ubuntu 18.04 w/ kernel version 5.0.0 and using xl170 instances on cloudlab (64GB of DRAM). Here are the full steps that I followed.

git clone https://github.com/uclasystem/midas.git
# with g++-9
./scripts/build.sh 

./scripts/set_memory_limit.sh 200
./scripts/run_daemon.sh

cd apps/synthetic/
make -j
./synthetic

# From the Daemon
[Error] 26790.9 0
[Error] 52708.6 0
[Error] 62409.5 0
[Error] 64874.6 0
[Error] 58199.8 0
....

Although these messages show up in the daemon log, the main program itself seems to execute fine. Do you know how I can interpret these error messages?

ivanium commented 1 day ago

Great. I think that means the program is running fine. I made some minor updates on branch lower-log-level so feel free to try it out. It contains a total of 2 commits and 3 LOC changes.

I double-checked the log and it turned out those should be debug information rather than error info. They were printed when the policy function ran and had no impact on program execution. I disabled them in this commit: https://github.com/uclasystem/midas/commit/b8a481eb08be2c298d67e48a037d88809a95efc7

Given that you are running the synthetic application with limited memory, it may take a long while for it to finish one profile iteration. I will suggest printing out real-time throughput (per 5 seconds) within Midas perf tools. This is also enabled in this lower-log-level branch (see this commit for details: https://github.com/uclasystem/midas/commit/175d638409489e7bb3d79a61f5204f671e4bf619). With this patch, the synthetic app should print out a throughput value in KOPS per 5 seconds.

Please feel free to try it out and let me know if you have further questions.