root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.63k stars 1.25k forks source link

gSystem->GetMemInfo reports wrong used RAM memory #7196

Open ferdymercury opened 3 years ago

ferdymercury commented 3 years ago

Describe the bug

On Linux, ROOT reports most of the RAM is full, even if it is free. Probably because it reports:

used = total - free instead of used = total - free - cached = total - available

See: https://github.com/root-project/root/blob/4d2a8650ed0f47fbffed33488426aaa6b6dfc73b/core/unix/src/TUnixSystem.cxx#L5153

/opt/root_bld/tutorials/gui $ root -l
root [0] MemInfo_t memInfo; gSystem->GetMemInfo(&memInfo);
root [1] cout << memInfo.fMemTotal << " " << memInfo.fMemUsed << " " << memInfo.fMemFree << endl;
15933 15381 552
root [2] .q
/opt/root_bld/tutorials/gui $ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        5.6G        658M        439M        9.3G        9.2G
Swap:          1.0G        135M        888M

Expected behavior

It should report the actual free memory, or add a new variable inside the MemInfo_t struct that specifies the 9.3 'cached' memory, which is actually free to use.

To Reproduce

MemInfo_t memInfo; gSystem->GetMemInfo(&memInfo);
cout << memInfo.fMemTotal << " " << memInfo.fMemUsed << " " << memInfo.fMemFree << endl;

or /opt/root_bld/tutorials/gui $ root -l CPUMeter.C

Setup

  1. ROOT version: 6.23/01
  2. Operating system: Ubuntu 18
  3. Self-built

Additional context

https://www.thegeekdiary.com/understanding-proc-meminfo-file-analyzing-memory-utilization-in-linux/

My suggestion to solve this would be to do:

  if (s.BeginsWith("MemAvailable")) {
     TPRegexp("^.+: *([^ ]+).*").Substitute(s, "$1");
     meminfo->fMemAvailable = (s.Atoi() / 1024);
  }

meminfo->fMemUsed = meminfo->fMemTotal - meminfo->fMemAvailable;

and then in https://root.cern/doc/master/structMemInfo__t.html#a62b63e5e24efd2e2b5d0f60456e239d6:

Int_t fMemAvailable; // available RAM in MB

For the SWAP it's probably not necessary.

Not sure if the same issue happens for other OS.

pcanal commented 3 years ago

or add a new variable inside the MemInfo_t struct that specifies the 9.3 'cached' memory

I would go for this solutions (add missing information).

This would requires update in GetLinuxMemInfo and GetDarwinMemInfo in TUnixSystem.cxx and GetWinNTMemInfo in TWinNTSystem.cxx

ferdymercury commented 3 years ago

or add a new variable inside the MemInfo_t struct that specifies the 9.3 'cached' memory

I would go for this solutions (add missing information).

Sounds good. And also document well what Used means in the MemInfo struct (as it is ambiguous and it does not correspond to any value in /proc/meminfo).

pcanal commented 3 years ago

@ferdymercury Would you be able to provide a PR?

ferdymercury commented 3 years ago

If it were only Linux, I would do it, but I am not familiar with OSx or WinNT...

I have put some code snippets above for the Linux version (MemAvailable), I tested it already and it seems to work.

AdvaitDhingra commented 3 years ago

@ferdymercury Would you be able to provide a PR?

I have attempted to fix this. I linked this issue in my PR.

aaronj0 commented 7 months ago

This issue is very case-dependent. When trying to calculate memory usage, normally the actual free memory on the system is free + buffer + cached

If shared memory usage is high (e.g mmaping a big cache) the calculation is slightly different: (free + buffer/cache) - shared

The author of htop mentions on this question that htop uses this convention(in 2016):

Now htop (managed by htop-dev) uses this (on https://github.com/htop-dev/htop/blob/main/linux/LinuxMachine.c#L210):

const memory_t usedDiff = freeMem + cachedMem + sreclaimableMem + buffersMem;
host->usedMem = (totalMem >= usedDiff) ? totalMem - usedDiff : totalMem - freeMem;

This issue intends to merge the total used memory to include cached instead of just free which isn't strictly a correct solution.

I have implemented the current approach utilised by htop which includes the following fields to calculate MemUsed, MemAvailable and SwapUsed:

Int_t     fMemAvailable; // available RAM in MB
Int_t     fMemCached; // cached RAM in MB
Int_t     fMemBuffer; // buffer RAM in MB
Int_t     fMemShared; // shared RAM in MB
Int_t     fSwapCached; // cached swap in MB
Int_t     fSReclaimable // slab that might be reclaimed

Fixed in

ferdymercury commented 3 weeks ago

@silverweed To fully close this issue, I guess that GetWinNTInfo should also be modified, as suggested by https://github.com/root-project/root/issues/7196#issuecomment-778310928 ? (Or we open a separate "Issue" for the Windows case). Thanks a lot for the merged PR in any case!

silverweed commented 3 weeks ago

@silverweed To fully close this issue, I guess that GetWinNTInfo should also be modified, as suggested by #7196 (comment) ? (Or we open a separate "Issue" for the Windows case). Thanks a lot for the merged PR in any case!

Yes, let's keep it open until it's fully working on all platforms. For Windows do we already know if this is an issue? I.e. what's the expected output and what's the actual output? Maybe @bellenot can help with this

ferdymercury commented 3 weeks ago

Nope, I do not know if it is an issue on Windows.

I think we could just make sure that the ROOT code does the same as: https://github.com/gsass1/NTop/blob/3ceac499d7c3bfc5f7a073df6e0bd7aa1babf790/ntop.c#L907

See also https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-globalmemorystatusex

ferdymercury commented 3 weeks ago

Actually, I am not sure the ntop implementation is correct. This seems more accurate: https://stackoverflow.com/questions/14371521/how-to-find-the-current-system-cache-size-on-windows

bellenot commented 3 weeks ago

Isn't the current implementation good enough? https://github.com/root-project/root/blob/master/core/winnt/src/TWinNTSystem.cxx#L6037

ferdymercury commented 3 weeks ago

On Linux, this type of implementation was not correct when there was a lot of 'cached' RAM memory. We had to subtract cache by hand.

But it might be that in Windows, this subtraction is already done behind the scenes.

bellenot commented 3 weeks ago

OK, I'll check how to do it properly on Windows