Closed nbehrnd closed 2 years ago
Thanks for the issue!
This gets computer-science complicated beyond my understanding quite quickly, but I'll make a start on things.
From the man pages, we discover that ls -s
displays size in blocks.
-s, --size print the allocated size of each file, in blocks
Ok, so what's a block? The best explanation I have found is here on reddit which basically says a block is the smallest writable section of storage on a disk, and that if your file is smaller than the block size, the remaining storage capacity of the block gets blanked out.
So why is it different on your system? Well it appears that the block size is system-dependent. https://askubuntu.com/questions/641900/how-file-system-block-size-works
FWIW the file size is shown as 13 byte on my system too:
$ ls -lh exercise-data/
total 28K
drwxrwxr-x 2 mbexegc2 mbexegc2 4.0K Sep 16 2021 animal-counts
drwxrwxr-x 2 mbexegc2 mbexegc2 4.0K Sep 16 2021 creatures
-rw-rw-r-- 1 mbexegc2 mbexegc2 13 Sep 16 2021 numbers.txt
drwxrwxr-x 2 mbexegc2 mbexegc2 4.0K Sep 16 2021 proteins
drwxrwxr-x 2 mbexegc2 mbexegc2 4.0K Sep 28 2021 writing
Thank you. When addressing the question I did not think in blocks (like in individual storage containers of fixed capacity each) rather than amount of information which may be split over multiple (on occasion only partly filled).
For me, this is a twofold reminder. For one, how hash sums like md5sum
and sha256sum
are computed; though there, if here is a block not filled to the rim with original data "to hash", the algorithms add (pad it with) zeroes. Because even submitting the single character A
eventually yields a string of 65 characters (counting prior the separator) running
$ echo A | sha256sum
06f961b802bc46ee168555f066d28f4f0e9afdf3f88174c1ee6f9de004fc30a0 -
$ echo A | sha256sum | cut -d " " -f 1 | wc -c
65
For two, thinking in blocks bears similarities how e.g., Fortran vectorizes operations after data pass via RAM and caches into registers of a CPU. For an addition of two one-dimensional arrays, for example, instead of the individual computation on a(1)
and b(1)
to yield c(1)
in one processing cycle (blue frame), the vectorized approach picks up multiple elements of array a
and corresponding elements of array b
(red frame) for multiple computations performed simultaneously. The pick for the next cycle of computation wouldn't only advance for one pair of elements (blue frame), but by the width the wider block (red frame):
The width of this «blocks of advancement» depend on the processor's architecture (size/capacity of registers available) and the floating numbers representation (single, double, quadruple precision) during the computation. Yet overall, this block-wise progression may accelerate the computation beyond the one-by-one approach, independent to parameters like processor frequency, use of multiple cores per CPU, etc.
Both images are (now edited) screenphotos taken during the class Fortran for Scientific Computing lead by Geert Jan Bex, Mag Selwa and Jan Ooghe.
observation:
This question refers to running
ls -s
as introduced in episode 2 on the pristine uncompressed test data fetched from here (again attached below). Contrasting to the output the page currently reportsI observe an output of
where the size about
numbers.txt
differs. Interestingly, the thunar file manager reports a size of 13 bytes for this very file.question:
Is the difference due to a different file
numbers.txt
, or/and an update of bash released after designing the lecture?system used:
The observation refers to bash (GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)) as provided from the repositories of Linux Debian 12/bookworm (branch testing) on a
ext4
partition including thunar 4.16.10 (Xfce 4.16). The most recent system upgrade run earlier today, 2022-05-27.ls --version
reportsls (GNU coreutils) 8.32
by 2020.shell-lesson-data.zip