helloSystem / ISO

helloSystem Live and installation ISO
https://github.com/helloSystem/
BSD 3-Clause "New" or "Revised" License
806 stars 58 forks source link

ISOs are too large #191

Open probonopd opened 3 years ago

probonopd commented 3 years ago

Describe the bug The ISO is too large. GitHUb Releases has a maximum of 2 GB.

To Reproduce Build an ISO

Expected behavior Is ~1.5 GB

Additional information We should run Filelight on the Live ISO to check what is eating up so much space.

grahamperrin commented 3 years ago

Produced by JDiskReport, archived for GitHub:

root.jdr.tar.gz

… PS after first seeing the red crosses, I guessed that liveuser can't access parts of the filesystem. I ran another report with sudo jdiskreport and still I see red crosses. Maybe they're negligible; at a glance, the measurements are the same:

image

The second report:

root, run as sudo.jdr.tar.gz

Ignore space taken by the following installations:

FreeBSD% grep pkg /var/log/messages
Mar 30 22:58:02 FreeBSD pkg.real[1763]: javavmwrapper-2.7.6 installed
Mar 30 22:58:03 FreeBSD pkg.real[1763]: java-zoneinfo-2020.a installed
Mar 30 22:58:14 FreeBSD pkg.real[1763]: openjdk8-8.265.01.1 installed
Mar 30 22:58:15 FreeBSD pkg.real[1763]: jdiskreport-1.4.1 installed
Mar 30 23:00:30 FreeBSD pkg.real[2053]: qtermwidget-0.15.0 installed
Mar 30 23:00:32 FreeBSD pkg.real[2053]: liberation-fonts-ttf-2.1.1,2 installed
Mar 30 23:00:32 FreeBSD pkg.real[2053]: qterminal-0.15.0 installed
Mar 30 23:03:06 FreeBSD pkg.real[2121]: firefox-81.0.1,2 installed
FreeBSD% 
probonopd commented 3 years ago
probonopd commented 3 years ago

Worst offenders which I would like to remove:

677045085   llvm10      10.0.1_4    LLVM and Clang
208688681   gcc9        9.3.0_2     GNU Compiler Collection 9
189883003   binutils    2.33.1_4,1  GNU binary tools
62249170    perl5       5.32.1_1    Practical Extraction and Report Language
51448358    db5     5.3.28_7    Oracle Berkeley DB, revision 5.3
36417351    pkg     1.16.3      Package manager
probonopd commented 3 years ago

What is drawing in compilers like llvm10 and gcc9? We don't want to ship those on the ISO. Developers are most likely connected to the internet and can download them as needed.


LLVM

FreeBSD% sudo pkg remove llvm10
Checking integrity... done (0 conflicting)
Deinstallation has been requested for the following 13 packages (of 0 packages in the universe):

Installed packages to be REMOVED:
        llvm10: 10.0.1_4
        mesa-dri: 20.2.3_1
        mesa-gallium-xa: 20.2.3
        slim: 1.3.6_21
        xf86-input-evdev: 2.10.6_6
        xf86-input-keyboard: 1.9.0_4
        xf86-input-libinput: 0.30.0_1
        xf86-input-mouse: 1.9.3_3
        xf86-video-ati: 19.1.0_3,1
        xf86-video-cirrus: 1.5.3_4
        xf86-video-scfb: 0.0.5_2
        xf86-video-vesa: 2.5.0
        xorg-server: 1.20.9_1,1

Why on earth would one need a compiler just to run Xorg?

According to Wikipedia,

Mesa (...) is an open source software implementation of OpenGL, Vulkan, and other graphics API specifications. Mesa translates these specifications to vendor-specific graphics hardware drivers. (...) Mesa also contains an implementation of software rendering called swrast that allows shaders to run on the CPU as a fallback when no graphics hardware accelerators are present. The Gallium software rasterizer is known as softpipe or when built with support for LLVM llvmpipe, which generates CPU code at runtime.

So I guess we are looking for a way to use softpipe rather than llvmpipe so that we can get rid of the LLVM dependency.


GCC (solved)

FreeBSD% sudo pkg remove gcc9
Checking integrity... done (0 conflicting)
Deinstallation has been requested for the following 2 packages (of 0 packages in the universe):

Installed packages to be REMOVED:
        fusefs-lkl: 4.16.g20180628_3
        gcc9: 9.3.0_2

Number of packages to be removed: 2

The operation will free 283 MiB.

Why on earth would mounting Linux filesystems 1. be so huge and 2. require a compiler at runtime 3. not use LLVM like the above?

Removed it in https://github.com/helloSystem/ISO/commit/3e1bd004e8906decbac831b3d06308d40eb5c744, especially since ext filesystems can also be mounted using https://www.freshports.org/sysutils/fusefs-ext2.


Why is binutils so large and what draws it in? Turns out this was also drawn in by fusefs-lkl, so that should be resolved as well.


Perl

What draws in perl5?


db5 (solved)

What draws in db5?

webcamoid -> jackit -> db5


pkg would be a great candidate. Who uses pkg is most likely connected to the internet and can bootstrap it.

To get rid of pkg we would probably need a lightweight replacement for pkg remove and pkg delete since we need those:

probonopd commented 3 years ago

We probably need some way to create empty dummy packages that we can install instead of the real ones. So that dependencies are "satisfied" for the package manager but no space is used for some packages that are declared as depdendencies but we don't want to ship on the ISO. E.g., all icon sets other than the one we are using, spidermonkey78, all fonts other than the ones we are using, and possibly some of the "worst offenders" above.

probonopd commented 3 years ago

It is too bad that we are shipping Gtk for legacy reasons; but do we really also need to ship Mozilla's JavaScript engine for it? sudo pkg remove spidermonkey78 would remove everything that depends on Gtk and would remove gvfs. So, how harmful would for FILE in $(pkg list spidermonkey78) ; do ; sudo rm $FILE ; done actually be?

probonopd commented 3 years ago

Got a hint over mail from a person who wishes to remain uncredited:

  1. Remove from /boot on the cd9660 filesystem every file that is not required before the root filesystem is mounted. From the files residing directly in /boot only 4 need to be preserved: 'loader', 'loader.conf', 'loader.rc' and 'device.hints'. The whole directory /boot/modules is unnecessary. From /boot/kernel you can delete all files except 'kernel', 'geom_uzip.ko', 'tmpfs.ko', 'xz.ko', 'opensolaris.ko', 'zfs.ko', 'firewire.ko'. You remove these files from the cd9660 filesystem only, not from the ZFS root filesystem.
    1. You compress the files left in /boot/kernel with gzip. The FreeBSD loader has built-in support for loading gzip-compressed files.

These tips are for the non-unionfs 12.2-based ISO's. I don't know if they can be applied to the unionfs ISOs.

crees commented 3 years ago

You could save 23MB by just deleting pkg-static.

[crees@pegasus]~ % du -hc `pkg query %Fp pkg` | tail -n 1
30M    total
[crees@pegasus]~ % du -hc `pkg query %Fp pkg | grep -v pkg-static` | tail -n 1
7.1M    total
probonopd commented 3 years ago

Isn't that dangerous, especially for potential future upgrades done by the user?

crees commented 3 years ago

Pkg is developed so quickly that in the event of any upgrade, it is most likely that pkg would be reinstalled anyway, and that would restore pkg-static. On the other hand, it is not a huge saving.

probonopd commented 3 years ago

Got further information over mail from a person who wishes to remain uncredited:

First, I've noticed that you don't compress the file /boot/kernel/kernel. It is much better to compress this file than the small .ko files.
Approximately 23MB would be saved.

Second, I did some additional research on which files in /boot are strictly necessary.

The following files from /boot are all the files that are necessary for booting successfully:
  1. loader
  2. loader.conf
  3. device.hints

Of course, the subdirectories of /boot need to be preserved.
The file 'loader.rc' which I was previously listing as necessary is not actually needed as it is a Forth script and the Lua loader ignores it.
The file 'loader.efi' is included on a FAT filesystem which is embedded at the beginning of the .iso and so the file doesn't need to also be present on the cd9660 filesystem.

The following files are necessary for the script 'mkisoimages-amd64.sh' to successfully assemble a bootable ISO:
  1. cdboot
  2. isoboot
  3. loader.efi
  4. pmbr

They are not needed on the cd9660 filesystem but removing them would break the script that assembles the ISO. You can modify the script to get them
from a copy of /boot that contains all files while the ISO is made from a copy of /boot that contains the minimum set of files but you will be complicating your life
to save only about half a megabyte. It is not worth doing it.
probonopd commented 3 years ago

In the same spirit: Why have the kernel inside the compressed filesystem when we need it on the ISO outside of the compressed filesystem anyway?

Just need to make sure that the installer copies it from the ISO location at installation time!

probonopd commented 3 years ago

Also, should we compress the modules inside the compressed filesystem? Probably not much gain space-wise but maybe faster startup?

probonopd commented 3 years ago

Did not find the correct way yet to delete the kernel from inside the compressed filesystem. (Complication: build.sh wants to copy it to the ISO from there.)

jbeich commented 3 years ago

LLVM [...] So I guess we are looking for a way to use softpipe rather than llvmpipe so that we can get rid of the LLVM dependency.

LLVM is required by AMD drivers: radeonsi (OpenGL) and radv (Vulkan). Static linking proposal was rejected, so waiting for subpackaging then mesa-dri can switch to whatever provides libLLVM.so. However, static linking would use less space due to only including AMDGPU LLVM backend.

probonopd commented 3 years ago

Thanks @jbeich for the explanation.

For 12.1 amd64 example, mesa-dri 19.0.8 built against llvm90 grows from 31 MiB to 109 MiB but drops 799 MiB large llvm90 dependency.

Wow. I hope static linking can be reconsidered.

It's sad that a few years back a complete desktop OS would fit onto a CD-ROM and nowadays the dependencies for one of the GPU drivers needs a 799 MB dependency.

kettle-7 commented 3 years ago

What draws in perl5?

hwprobe.

By all means scrap gcc, we already have CLang.

probonopd commented 3 years ago

hwprobe

This is used for the Hardware Probe utility that can send information about the hardware to https://bsd-hardware.info.

@linuxhw do you have any plans for a version that would not require Perl? Or possibly we could bundle just a minimal subset of Perl that is really required for hwprobe?

kettle-7 commented 3 years ago

Worst offenders which I would like to remove:

GTK+ 2 and 3: GTK 2 is deprecated, and GTK 3 has been replaced by GTK 4. What apps depend on them, and if it's for in the installed environment, would it be acceptable to remove them from the ISO and make the installer install them?

probonopd commented 3 years ago

The Screen Settings and Print Settings preferences applications require Gtk currently. Are there suitable Qt replacements?

grahamperrin commented 3 years ago

From https://github.com/helloSystem/Utilities/issues/44#issuecomment-803230003

the /usr/local/bin/lxqt-config-brightness part of sysutils/lxqt-config.

There's more:

image

https://www.freshports.org/sysutils/lxqt-config/#requiredrun

bsdhw commented 3 years ago

hwprobe

This is used for the Hardware Probe utility that can send information about the hardware to https://bsd-hardware.info.

@linuxhw do you have any plans for a version that would not require Perl? Or possibly we could bundle just a minimal subset of Perl that is really required for hwprobe?

It requires only Perl 5 base and perl-Digest-SHA to operate.

probonopd commented 3 years ago

From helloSystem/Utilities#44 (comment) (...)

Not sure how to read this.

kettle-7 commented 3 years ago

We are not using lxqt-config-brightness so far

Check the Shortcut Keys app. I think we are.

grahamperrin commented 3 years ago

https://github.com/helloSystem/ISO/issues/191#issuecomment-849841908

Screen Settings … Qt replacements?

https://github.com/helloSystem/ISO/issues/191#issuecomment-850607385

… Not sure how to read this. …

The arrow points at Monitor Settings.

lxqt-config-monitor

probonopd commented 3 years ago

Why the heck do we have 5 versions of Python now...

122873896   python39    3.9.5   Interpreted object-oriented programming language
121181684   python38    3.8.10  Interpreted object-oriented programming language
115503431   python37    3.7.10_1    Interpreted object-oriented programming language
109421942   python36    3.6.13  Interpreted object-oriented programming language
72241337    python27    2.7.18_1    Interpreted object-oriented programming language
kettle-7 commented 3 years ago

Many programs depend on python 2 despite its depreciation, but we don't need Python 3.6, 3.7, 3.8 AND 3.9.

probonopd commented 3 years ago

KWin is a challenge... see https://github.com/helloSystem/hello/issues/164

probonopd commented 2 years ago

Further idea: Factor out all developer related files and non-localized documentation to a separate download for developers.

Just for experimentation:

mkdir "${uzip}"
cpdup -i0 -s0 /media/uzip/ "${uzip}"/
find "${uzip}"/ -name doc
find "${uzip}"/ -name doc -type d
find "${uzip}"/ -name doc -type d -exec rm -rf {} \;
find "${uzip}"/ -name doc -type d -exec rm -rf {} \;
find "${uzip}"/ -name docs -type d -exec rm -rf {} \;
find "${uzip}"/ -name '*.la' -type f -exec rm -rf {} \;
find "${uzip}"/ -name man -type d -exec rm -rf {} \;
find "${uzip}"/ -name include -type d -exec rm -rf {} \;
find "${uzip}"/ -name '*.h' -type f -exec rm -rf {} \;
find "${uzip}"/ -name .cache -type d -exec rm -rf {} \;
rm -rf "${uzip}"/dev/*
rm -rf "${uzip}"/rescue/*
find "${uzip}"/ -name debug -type d -exec rm -rf {} \;
find "${uzip}"/ -name '*.a' -type f -exec rm -rf {} \;
find "${uzip}"/ -name '*.o' -type f -exec rm -rf {} \;
find "${uzip}"/ -name src -type d -exec rm -rf {} \;
find "${uzip}"/ -name git-core -type d -exec rm -rf {} \;
find "${uzip}"/ -name git -type d -exec rm -rf {} \;
find "${uzip}"/ -name git -type f -exec rm -rf {} \;
find "${uzip}"/ -name devhelp -type d -exec rm -rf {} \;
find "${uzip}"/ -name '*-doc' -type d -exec rm -rf {} \;
find "${uzip}"/ -name examples -type d -exec rm -rf {} \;

makefs /media/somewhere/slimmedrootfs.ufs '/home/liveuser/"${uzip}"'
mkuzip -A zstd -C 15 -o  /media/somewhere/slimmedrootfs.uzip /media/somewhere/slimmedrootfs.ufs

Result: from 1576574976 B (1.5 GiB) to 1151860736 B (1.1 GiB), meaning ~1/4 savings. Worth it?

There is still a huge compiler in there but I understand that FreeBSD cannot run a graphical desktop without one, which is a pity because non-developers will usually never see a compiler in their life.

probonopd commented 2 years ago

Deleting /usr/local/llvm12 still allows us to boot into a graphical desktop but on Intel GPUs drop shadows of menus etc. are funky, and there are error messages coming from swrast_dri.so about libLLVM-12.so being missing, So https://github.com/helloSystem/ISO/commit/e26134cfef0f416cdba456b756cf53edfdd8b109 reverts deletion of that (huge!) library.

It's a pity.

probonopd commented 2 years ago

TODO: Search large directories for more large files with ls -lhS <some directory> | head and see what can be safely removed. Any hints welcome!

Ubuntu used to come on a CD-ROM. Can we un-bloat the system to the point that it fits within a 800 MB ISO?

probonopd commented 1 year ago

Wild idea: Use filemon(4) to track which files are actually used and remove the rest... We'd need a boot-time option that would enable it and write all accessed files to a logfile. Doable?

This technique is frequently used to minimize containers...

Maybe @rosorio has an idea how to use filemon to log all files that are accessed while running an operating system, from boot to shutdown?

I was thinking along the lines of: Use filemon(1) wrapped around init. Since init starts everything, we can possibly catch everything.

louies0623 commented 1 year ago

@probonopd I recently saw KNOPPIX's Cloop and its compression is very effective, a 1.8GB file can be reduced to 700MB.

probonopd commented 1 year ago

That's very old, I think zstandard is at least as good if not better (although I have not made scientific tests).