projg2 / eclean-kernel

Installed kernel cleanup tool
GNU General Public License v2.0
31 stars 10 forks source link

Make read_version_from_raw() more robust #44

Open Flowdalic opened 9 months ago

Flowdalic commented 9 months ago

Ignore the "parsed" Linux kernel version information if there are non-ASCII at the beginning of the selected buffer.

Fixes #27.

mgorny commented 9 months ago

I suppose this would work most of the time, though I would prefer to know why some files don't have the version there. I suspect you may still accidentally grab some random string as version.

Flowdalic commented 8 months ago

though I would prefer to know why some files don't have the version there.

Likely because they are not a Linux kernel. As I wrote in https://github.com/projg2/eclean-kernel/issues/27#issuecomment-1759625817 that happens when eclean-kernel looks at an initramfs.

For example, gunzip initramfs-6.1.38-gentoo-dist.img -c | strings |less shows

3CIFS: VFS: cifs ioctl 0x%x
7CIFS: cifs ioctl 0x%x
7CIFS: ioctl notify rc %d
7CIFS: unsupported ioctl
cifs
Linux version 
CIFS VFS Client for Linux
fs/smb/client/sess.c
OS/2
CIFS: %s: OS/2 server
mgorny commented 8 months ago

Hmm, the logic got a bit messy with that "raw" image support. Previously the version reading logic did also serve the purpose of recognizing valid image formats (bzImage, EFI). The "raw" reader just does the version string search without actually attempting to verify the file format.

I suppose the correct approach would be to add some kind of minimal header verification there.

CC @Jannik2099, you've added "support for non-bzImage kernels". Do you think we could narrow it down to specific image formats, with explicit magic checks?

Jannik2099 commented 8 months ago

hmm, do the raw, decompressed kernel images share some magic? Are they guaranteed to be ELF files on all arches?

mgorny commented 8 months ago

I hoped you'd have some idea.

mgorny commented 8 months ago

CC @AndrewAmmerlaan too.

AndrewAmmerlaan commented 8 months ago

Are they guaranteed to be ELF files on all arches?

No, the vmlinuz.efi on arm64 and riscv (with CONFIG_EFI_ZBOOT) are PE32+ not ELF. Maybe some other formats are also possible with different config options.

AndrewAmmerlaan commented 8 months ago

CC @AndrewAmmerlaan too.

I'm not sure I understand the original issue this is fixing, what non-ASCII characters are there and how did they get there?

Flowdalic commented 8 months ago

I'm not sure I understand the original issue this is fixing, what non-ASCII characters are there and how did they get there?

tl;dr: The match logic employed by read_version_from_raw() is fragile, as it assumes that there is always a version String right after the String Linux version. This fails if the binary just has the null-byte terminated string Linux version embedded. Which is not unlikely.

The read_version_from_raw() function performs a naive search for the String Linux version on a given binary. If it matches, then it assumes that what follows the match is another string containing the version number. However, any binary may contain the String Linux version \0, after which arbitrary bytes follow. This is the case for the initramfs binary that triggers this. Here, read_version_from_raw() returns a byte array starting with the null byte and some more bytes. This byte array is later passed to str(), which also seems to work fine. However, once the resulting string is passed to subprocess.Popen(), because it wants to invoke 'kernel-install', 'remove', <kernel-version>, <kernel-filename>, it throws, because is a String that starts with the null byte: '\x00CIFS'.

mgorny commented 8 months ago

@Flowdalic, btw doesn't it incorrectly recognize your initramfs as kernel then?

Flowdalic commented 8 months ago

@Flowdalic, btw doesn't it incorrectly recognize your initramfs as kernel then?

That is correct.

mgorny commented 8 months ago

@AndrewAmmerlaan, the problem is that the original logic combined getting internal version with checking file magic for kernel. However, as we added support for "raw" kernels, we've ended up looking for strings without any extra magic verification, so we end up treating any file containing Linux version as a kernel.

My idea is that we ought to require some kind of magic for "raw" kernels.

Flowdalic commented 8 months ago

My idea is that we ought to require some kind of magic for "raw" kernels.

That would be ideal, if feasible. In addition (or in the meantime), can we get this PR merged to make things more robust?

ajakk commented 6 months ago

Then how about tacking on something like this, to make files with invalid magic be percieved as invalid kernels in the caller?

diff --git a/ecleankernel/file.py b/ecleankernel/file.py
index 1a79c9f..39506b4 100644
--- a/ecleankernel/file.py
+++ b/ecleankernel/file.py
@@ -205,6 +205,10 @@ class KernelImage(GenericFile):

         # check if it's compressed first
         b = self.decompress_raw(f)
+
+        if not b.startswith(b"MZ"):
+            return None
+
         # unlike with bzImage, the raw kernel binary has no header
         # that includes the version, so we parse the version message
         # that appears on boot

Seems to work for me at least, but I've not tested it everywhere available to me. Will need some tests to have magic added to their test kernel files, too, I think.

Flowdalic commented 4 months ago

I am still running into this.

Could we please get a fix into eclean-kernel?

If ajakk's suggestion of using MZ as magic byte is robust, then we can go obviously with that. However, if nobody knows if it is robust, then please consider merging this PR, even if it is not the perfect solution. I am fairly certain that it is pretty robust, resolves the issue and it certainly has no false negatives.