Use (de)serialization to shift nano_core symbol parsing to build time

zetanumbers commented 3 years ago

As mentioned in #425 readelf should be replaced with a proper "handmade" tool. One aspect of the proposed solution is serialization/deserialization, which can be addressed by the serde crate. Other than that, vendoring serde might help in the future with similar problems.

As you may know, serde does not support any serialization format. There are different paths to consider:

Vendor a textual format serializer/deserializer like serde_json;
Vendor a binary format serializer/deserializer like postcard;
Implement and have our own.

Every crate mentioned supports no_std.

kevinaboos commented 3 years ago

@ZetaNumbers thanks for opening this issue! I have used/supported Serde on Theseus in the past and it worked well. The only reason I hesitated to merge it into the main branch is that it introduces a very large dependency set, which I originally wanted to avoid especially in the nano_core, a component intended to be as small as possible.

That being said, at this point I think it's best to just use that existing support in Serde for two things:

obtaining the set of symbols in the nano_core, loaded by the bootloader (currently done by parse_nano_core)
semi-persistent crate loading across reboots -- saving & restoring the loaded crate metadata to disk upon shutdown, and then restoring it from disk upon boot instead of rebuilding it manually.

We can start with (1) above and then extend it to support (2) once other things are in place. I don't have the bandwidth to tackle this on my own right now, but I am happy to help/guide someone to implement it. A good implementation strategy would be to create a new Rust tool (in tools/) that takes the types from kernel/crate_metadata as a dependency in order to generate + serialize a nano_core LoadedCrate and its constituent LoadedSection objects into a binary file, which we then load as a multiboot2 module just like we currently do with the k#nano_core.sym file generated in the Makefile. That would effectively move the symbol parsing and metadata construction stage to build time, leaving only the deserialization at runtime. For the actual (de)serializer, there's no need to get fancy -- we can just select one based primarily on deserialization speed, as that's the current boot-time bottleneck; something like bincode or mincode would probably be suitable.

One more thing: I agree that using serde is a good idea, but there's no need to vendor it or any of its (de)serialization support crates into Theseus yet. We have discussed vendoring in the past (#349), but it's currently unnecessary with our usage of lockfiles and version pinning. I've changed the title to reflect this.

zetanumbers commented 3 years ago

Bincode does not support no_std https://github.com/bincode-org/bincode/issues/265

kevinaboos commented 3 years ago

No problem, that was just an example. If needed, we can always fork any necessary crates and port them to no_std, that's been a recurring theme in the past.

zetanumbers commented 3 years ago

Actually there's another option. We could generate static libraries for nano_core.

zetanumbers commented 3 years ago

Some crate_metadata types contain Arcs. serde (or any serialization framework really) could only serialize Arc's owned data per Arc, essentially cloning data.

zetanumbers commented 3 years ago

Actually there's another option. We could generate static libraries for nano_core.

Well, i don't know any good way to do that actually.

kevinaboos commented 3 years ago

Some crate_metadata types contain Arcs. serde (or any serialization framework really) could only serialize Arc's owned data per Arc, essentially cloning data.

I see, yea that might actually be a problem. I'd have to look into it to determine how we could handle shared smart references like Arc.

Actually there's another option. We could generate static libraries for nano_core.

Well, i don't know any good way to do that actually.

I'm not sure what you mean by "generate static libraries"? The nano_core is already built as a static library.

zetanumbers commented 3 years ago

I'm not sure what you mean by "generate static libraries"?

I meant with embedded data. Like with a const.

kevinaboos commented 3 years ago

Oh, I see, something like encoding the actual serialized crate metadata data structures as binary data into the static library itself. Sure, we could try that, that's a pretty cool idea, although I'm not sure what the advantage of doing that is over simply having a separate file with the same binary data. Either way it still needs to be parsed or deserialized.

kevinaboos commented 3 years ago

the binread crate might be useful for this, too

SamuraiCrow commented 2 years ago

I see you've still got the "help wanted" sign up. Is anybody still working on this? I can look into this over the holidays if interested.

kevinaboos commented 2 years ago

It's relatively low priority, so nobody is working on it as of yet. However I'm certainly happy to have anyone help out!

SamuraiCrow commented 2 years ago

esl01-mincode doesn't appear to be the one you were referring to in comment https://github.com/theseus-os/Theseus/issues/431#issuecomment-900491743 because it's GPL2. I'm not a lawyer but I think the license of Theseus is MIT or Apache?

Binread is a general purpose crate for reading any binary file format. That appears to be the only one left, unless somebody forks bincode.

The thread linked above in comment https://github.com/theseus-os/Theseus/issues/431#issuecomment-900568756 about bincode has a more recent post citing 2 dependencies from Serdes as being NoStd but having only std.io as a remaining dependency. I'm not sure that qualifies as having few enough dependencies for this project though.

Now that I've looked that far, I'm not sure I'm going to be able to wade through this in the amount of time I've got either. It looks like trying to implement a kind of "hibernate" mode for just the crates used by the startup code, am I right?

edit Wouldn't recompiling as PIC make it quicker to parse by removing most of the relocs? That's one thing I learned from my old Commodore Amiga, if relative addressing is available in the CPU, use it! It's quicker than processing a bunch of relocations.

kevinaboos commented 2 years ago

Now that I've looked that far, I'm not sure I'm going to be able to wade through this in the amount of time I've got either. It looks like trying to implement a kind of "hibernate" mode for just the crates used by the startup code, am I right?

Yes, effectively. It'd also help to accelerate cold startup too.

Wouldn't recompiling as PIC make it quicker to parse by removing most of the relocs? That's one thing I learned from my old Commodore Amiga, if relative addressing is available in the CPU, use it! It's quicker than processing a bunch of relocations.

Definitely, this is another thing on the to-do list. Literally any other linking model than static with a large code model would be better.... :smile:

SamuraiCrow commented 2 years ago

Re:linkage Before WebAssembly came out, I used to try to use LLVM bitcode directly as a somewhat portable intermediate representation. There were two linker utilities for bytecode: LLVM-LD and LLVM-LINK. The former would just export symbols for linkage, I think, but LLVM-LINK would actually merge bitcode files and resolve symbols completely, if possible. I wonder if rust has a way of invoking that mechanism. It would allow us to merge crates into one big master crate.

kevinaboos commented 2 years ago

If I understand correctly, would partial linking via ld -r accomplish that? I've used that to partially link (resolve some relocations for) a series of object files in a libc project I was working on for Theseus.

SamuraiCrow commented 2 years ago

Yes. After having checked the man page for GNU LD that looks like the option.

If you can get relocation code sections to reduce down to a minimum, it might make a bigger difference than writing a custom parser would have done in the first place. Binary parsers are generally faster than text parsers because enumerations and so on can generally be read in one bus cycle while characters take multiple cycles due to varied word lengths.

SamuraiCrow commented 2 years ago

My apologies. Over the winter holidays I started devising a way to make GUI elements more responsive instead of working on this issue. My suppositions about binary parsers are still untested.

kevinaboos commented 2 years ago

Sorry for the delay, totally missed this.

Yes. After having checked the man page for GNU LD that looks like the option.

If you can get relocation code sections to reduce down to a minimum, it might make a bigger difference than writing a custom parser would have done in the first place. Binary parsers are generally faster than text parsers because enumerations and so on can generally be read in one bus cycle while characters take multiple cycles due to varied word lengths.

Relocation sections aren't present in the base kernel image because it's fully statically linked, so that shouldn't be relevant consideration. I do agree that binary parsers are faster, which is I think the core of this issue. However, parsing the actual ELF file itself with ELF parsers (e.g., xmas-elf, goblin) was much slower, hence my approach to try something else.

In any case, no worries, this isn't an urgent priority since things still work properly.

tsoutsman commented 2 years ago

Could the generation of the LoadedCrate be done using a procedural macro? This would avoid the need for serializing and deserializing.

tsoutsman commented 2 years ago

On second thought, neither macros nor build scripts will work because they would run prior to nano_core being built.

kevinaboos commented 2 years ago

Yeah, unfortunately this isn't information known at the lexical or AST level, so we can't use proc macros. Some kind of post-build hook that integrates into cargo might work, but there's no need for that when our build system already invokes several other custom scripts and Rust tools after cargo is finished.

This issue is actually quite easy to solve, at least conceptually, we just haven't had the time to devote to it. All that needs to be done is to write a Rust app/tool that:

Generates crate metadata for the base kernel image (the nano_core), i.e., effectively just doing what parse_nano_core does.
Serializes that crate metadata to a file. Here, bincode would probably be a good choice; it does now support no_std and I have used it in Theseus already for wasmtime.
Adds that serialized metadata file to build/grub-isofiles/modules, along with all the other object files.
Uses bincode or whatever crate to deserialize that file back into an in-memory representation of the crate metadata.

To be clear, this is a minor performance improvement that would only serve to speed up the very earliest bootstrap procedure. It's not high priority, but as I mentioned in a prior comment, I'm happy to help mentor someone through it.

tsoutsman commented 2 years ago

According to the Serde book, serializing and deserializing arc doesn't preserve identities. This means we would have to go through each of the loaded sections setting the parent crate (and some other stuff) at runtime, but as far as I understand, we would have to be setting the mapped pages for each section manually anyway.

kevinaboos commented 2 years ago

That's ok, I figured there would have to be some additional cleanup work to be done after the initial deserialization. I'm sure it'd still be faster than our current approach.

theseus-os / Theseus

Use (de)serialization to shift nano_core symbol parsing to build time #431