gsauthof / cgmemtime

cgmemtime measures the high-water RSS+CACHE memory usage of a process and its descendant processes.
110 stars 17 forks source link

Cache memory #1

Open sarum9in opened 11 years ago

sarum9in commented 11 years ago

https://www.kernel.org/doc/Documentation/cgroups/memory.txt , section 5.2 for definitions.

You can find, that usage_in_bytes display RSS+CACHE value, that is not RSS (rss + file_mapped from memory.stat) as said in documentation above (see second note in 5.2).

Example: consider file "test" ~512MiB.

(1) $ sudo ./cgmemtime dd if=test  | sudo dd of=out
1036288+0 records in
1036288+0 records out
530579456 bytes (531 MB) copied, 8.18783 s, 64.8 MB/s
Child user:    0.240 s
Child sys :    1.280 s
Child wall:    8.189 s
Child high-water RSS                    :        696 KiB
Recursive and accumulated high-water RSS:        128 KiB
1036288+0 records in
1036288+0 records out
530579456 bytes (531 MB) copied, 11.1497 s, 47.6 MB/s
(2) $ sudo ./cgmemtime dd if=test of=out
1036288+0 records in
1036288+0 records out
530579456 bytes (531 MB) copied, 10.6179 s, 50.0 MB/s
Child user:    0.220 s
Child sys :    4.480 s
Child wall:   10.721 s
Child high-water RSS                    :        688 KiB
Recursive and accumulated high-water RSS:     516480 KiB

As you can see, recursive RSS value (generated from max_usage_in_bytes) is affected by file cache. In the first example it was disabled by sending output to pipe, which is not affected by cache. Last dd was not accounted by cgroup. In the second example dd wrote it itself, so cache was accounted, giving us incorrect value.

P.S. Currently I am working on memory accounting problem with cgroup help. If you have any ideas how to implement real RSS high-water accounting without CACHE, please contact me or post it in reply.

gsauthof commented 11 years ago

I can reproduce this issue.

Looking at memory.txt I currently don't see a way how to exclude the cache-accounting from the highwater value (memory.max_usage_in_bytes).

At least I'll change the program output/README such that this value is not reported as just RSS.

gsauthof commented 8 years ago

Output/README was changed in:

https://github.com/gsauthof/cgmemtime/commit/8098f9bdce1af4e85cbf5d7f3d2a06083e1bb3ed

beatmax commented 8 years ago

Any updates on this? Any hope of improvement with cgroup-v2?

This issue affects the stability of our performance measurements. When the OS has been under heavy stress and caches have been dropped, we get completely incoherent measurements (eg. 10x above normal).

It can be reproduced with any simple program that reads data from disk, before and after dropping caches with "sync; echo 3 > /proc/sys/vm/drop_caches".

gsauthof commented 8 years ago

@beatmax , no, there is no update so far. Unfortunately, the cgroup-v2 interface doesn't provide any high-water information (cf. Section 5.2, Memory). That means It doesn't even provide the high-water information cgroup-v1 does.