includeos / IncludeOS

A minimal, resource efficient unikernel for cloud services
https://includeos.github.io/
Apache License 2.0
4.93k stars 365 forks source link

Revive Fieldmedic plugin, add early kernel diagnostic hooks #2283

Closed alfreb closed 1 month ago

alfreb commented 2 months ago

The intention is to create a reusable pattern for kernel self tests and diagnostics that costs nothing default. We have quite a few defensive validation steps in the kernel added organically as issues have been found. Ideally a fast booting kernel should always do the correct thing without having to validate. In practice, validation is crucial.

With this proposed setup we can optionally add a validation plugin (fieldmedic) that performs more extensive self tests after key checkpoints in the unikernel lifetime has been reached, with negligible cost in the default case (the cost of an empty function call per hook) and with zero cost when toggled off at compile time.

See commit messages for details.

To try it out:

$ nix-shell ~/IncludeOS/shell.nix --pure  --argstr unikernel ~/IncludeOS/test/kernel/integration/osinit --run ./test.py
...
<vm> <Multiboot>OS loaded with 1 modules
<vm>    * osinit.elf.bin booted with vmrunner @ 0x29e000 - 0x43f6c0, size: 1709760b
<vm> * Multiboot begin: 0x9500
<vm> * Multiboot end: 0x3e5490
<vm> [ Field Medic ] ⛑️  BSS Diagnostic passed
<vm> [x86_64 PC] constructor
<vm> [ Machine ] Initializing heap
<vm> [ Machine ] Main memory detected as 129998656 b
<vm> [ Machine ] Reserving 1048576 b for machine use
<vm> [ Field Medic ] ⛑️  Machine allocator for x86 PC functional
<vm> [ Field Medic ] ⛑️  Elf header intact, global constructors functional
<vm> [ Field Medic ] ⛑️  Malloc and brk are functional
<vm> ================================================================================
<vm> 
<vm>                            #include<os> // Literally
<vm> 
<vm> ================================================================================
<vm>      [ Kernel ] Stack: 0x1ffbb8
<vm>      [ Kernel ] Boot magic: 0x2badb002, addr: 0x9500
...
...
<vm>      [ Kernel ] Running service constructors
<vm> --------------------------------------------------------------------------------
<vm> ================================================================================
<vm>  IncludeOS VERY_DIRTY (x86_64 / 64-bit)
<vm>  +--> Running [ OS initialization test ]
<vm> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<vm>  +--> WARNING: No good random source found: RDRAND/RDSEED instructions not available.
<vm>  +-->        To make this warning fatal, re-compile with FOR_PRODUCTION=ON.
<vm> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<vm> Service::start entered
<vm> [ Field Medic ] ⛑️  Service finished. Diagnosing.
<vm>                 [+] Field medic plugin active
<vm>                 [+] Post .bss invariant still holds
<vm>                 [+] Post machine init invariant still holds
<vm>                 [+] Post init libc invariant still holds
<vm> [ Field Medic ] ⛑️  Diagnose complete. Healthy ✅

[ SUCCESS ] OS initialization test succeeded
MagnusS commented 2 months ago

It seems useful to have this for verifying that everything works correctly after changes to early boot stages of the kernel, in particular when we update our dependencies to make sure that everything works as intended (e.g. for nixpkgs / musl / compiler / c++ version upgrades). Great that you added a test as well!

The kernel itself builds and integration tests seem to work fine, but the unittests are failing to build for me. Here's the error from nix-build ./unittests:

/build/test/../api/kprint:43:13: error: conflicting types for 'kprint'
   43 | extern void kprint(const char*);
      |             ^
/build/test/lest_util/os_mock.cpp:80:6: note: previous definition is here
   80 | void kprint(char* str)
alfreb commented 2 months ago

Oups, good catch, thank you! Fixed.

MagnusS commented 2 months ago

Everything compiles and tests are green. I noticed a call to std::string that I missed on first review - I'm not sure how early FILLLINE is called, but you may want to replace that with std::pmr::string to avoid any calls to malloc before libc is initialised.