pnlbwh / diskusage-logging

Log disk usage automatically
2 stars 1 forks source link

Mismatch in total size between logger and manual df #14

Closed tashrifbillah closed 3 months ago

tashrifbillah commented 3 months ago

Hi @cjennings, in PR #13 , we worked on appearance so far. Now, let's talk algorithm. Below is the report of /data/predict1/data_from_nda/Pronet and /data/predict1/data_from_nda/Prescient folders. The 15 and 9 TB sizes appear in two places in report-data-20240629.html.

image

But a manual df -h --si reports much bigger sizes:

image

Given the same command is used in diskusage-logging program, can you look into the mismatch? Is your addition algorithm missing soemthing?

tashrifbillah commented 3 months ago

Keep in mind the depth factor. Data inside the two folders are many levels deep. If diskusage logger is reporting lower sizes because we asked it to explore up to a lower depth i.e. fewer files, I would like you to investigate and confirm.

colinjennings commented 3 months ago

In logdirsizes, the du command is used as such: du --time -b --max-depth $depth $dir

I ran this command and it produced similar output to the table inside report-data-20240629.html. I will read up on the du command to try diagnose this discrepancy.

tashrifbillah commented 3 months ago

Difference is usually caused by power of 1024 vs 1000. The command I used uses power of 1000.

tashrifbillah commented 3 months ago

Posting for record:

       -b, --bytes
              equivalent to '--apparent-size --block-size=1'

       --apparent-size
              print  apparent  sizes,  rather  than disk usage; although the apparent size is
              usually smaller, it may be larger due to holes in  ('sparse')  files,  internal
              fragmentation, indirect blocks, and the like

       --si   like -h, but use powers of 1000 not 1024

       -h, --human-readable
              print sizes in human readable format (e.g., 1K 234M 2G)
tashrifbillah commented 3 months ago

I read the documentation more. It seems that --apparent-size is making the difference between my output and diskusage-logger's output. Also, I remember we adopted du --si -sh to match with ERIS' billing long ago.