rustwasm / team

A point of coordination for all things Rust and WebAssembly
MIT License
1.45k stars 59 forks source link

It is way too hard to avoid including panicking and formatting infrastructure code #19

Closed fitzgen closed 5 years ago

fitzgen commented 6 years ago

Even though panicking just translates into a trap without any diagnostic messages, we still include tons of being_panic_fmt etc type code. This has a huge code size footprint: ~75% of my code size after wasm-gc!

I had to write wasm-snip pretty much just for removing panicking and formatting infrastructure. But that is just a stop-gap, not a solution: it is fragile and manual.

est31 commented 6 years ago

You could theoretically install a panic hook. Which is what I have done in one of my wasm projects in order to be able to find out what the panic was about.

pepyakin commented 6 years ago

My experience (although it is more -emscripten like) is that changing panic_fmt to something that doesn't touches it's args and building with LTO will remove most of the Display/Debug machinery and data from segments

However, there was still problems: sometimes this strings wasn't removed despite the fact panic_fmt was almost empty and "Arguments" parameter was dead. As I can remember, I managed to solve this issue by running additional LLVM passes: deadargelim and some globals cleaning pass, over the final .ll file.

As far as can tell that wasn't emscripten fault.

fitzgen commented 6 years ago

You could theoretically install a panic hook. Which is what I have done in one of my wasm projects in order to be able to find out what the panic was about.

Can one change the panic hook without resorting to unstable features?

est31 commented 6 years ago

@fitzgen yes, the panic hook API is stable: https://doc.rust-lang.org/stable/std/panic/fn.set_hook.html

Generally I don't object to an option that allows you to turn off all panic related things but there should be an option to keep the panic machinery, especially as we can't emit any source maps on the unknown target yet. I know this is a bit hard given that cargo can't recompile std :/.

fitzgen commented 6 years ago

Generally I don't object to an option that allows you to turn off all panic related things but there should be an option to keep the panic machinery, especially as we can't emit any source maps on the unknown target yet. I know this is a bit hard given that cargo can't recompile std :/.

Crate authors could optionally set the panic hook based on a cargo feature.

If setting a custom panic hook really does get rid of all this bloat, then we should document how to do this somewhere and then close this issue.

fitzgen commented 6 years ago

Oh, and devtools show the disassembly, and if you have the "name" section (-g or debug = true), also show function names, which should help for debugging.

est31 commented 6 years ago

@fitzgen I'm aware. They don't show the file name and line though.

est31 commented 6 years ago

If setting a custom panic hook really does get rid of all this bloat, then we should document how to do this somewhere and then close this issue.

There are two mechanisms:

To make the first option work on stable, @japaric has written RFC 2070, but it still has to be implemented (tracking issue).

We can't really use this as we are no #[no_std] application. Maybe we could make the compiler allow later #[panic_implementation] invocations to override earlier ones somehow with a smart trick^TM. In order for this to work, LTO still needs to be able to optimize out the now dead code. Maybe this is not really possible. No idea. However, if xargo gets integrated into cargo, we can finally compile the entirety of libstd with panic=abort which would be the best resolution to this entire question.

koute commented 6 years ago

However, if xargo gets integrated into cargo, we can finally compile the entirety of libstd with panic=abort which would be the best resolution to this entire question.

I feel like having xargo integrated into cargo would solve a few of our other problems and points of contention too.

In the meantime I guess just going #[no_std] is going to be the best bet in getting the tiniest code footprint? Well, that or manually sniping parts of the .wasm file.

mgattozzi commented 6 years ago

@fitzgen I'm still working on #12 but once it lands if that does work we should add it there as a resource. Even a whole section on reducing binary size would be good.

pepyakin commented 6 years ago

However, there was still problems: sometimes this strings wasn't removed despite the fact panic_fmt was almost empty and "Arguments" parameter was dead.

It seems that it is also issue for wasm32-unknown-unknown target. I've created an issue https://github.com/rust-lang/rust/issues/47526.

gamozolabs commented 6 years ago

I'm so glad so many people are having these issues. I've been working on an RFC for a while (since December) on making a fmtless panic, as it is crazy that the core of the language (panic) requires one of the most bloated parts of core. I didn't really work on the RFC seriously as I thought I was the only one with this issue and it wouldn't go anywhere.

I'll start to prioritize it.

Currently the model is to have an opt-in fmt-less panic routine, perhaps just provides source file and line number (which for embedded targets is glorious debugging as is).

alexcrichton commented 6 years ago

To me this is a pretty broad and general issue that's not going to have one solution but rather a lot of different pieces. There's a whole slew of reasons that panicking infrastructure / formatting infrastructure are difficult to remove today. I don't think it's worthwhile to blanket shoot for "remove everything at all costs" because panics are actually quite useful when debugging and such. I feel like it's always going to be true that if you're optimizing for code size (like you are on wasm) you'll be writing Rust differently than you would if you weren't optimizing for code size. Most of the idioms of Rust (not just panics) are not optimized for code size, and that's a debt you need to pay down when optimizing for that.

For me I think a clear first step towards making this issue less painful is to start looking and reorganizing the standard library where possible. There's plenty of locations in the standard library that contain panics when they shouldn't. In some cases LLVM just couldn't optimize it away or in others the code just needs to be restructured to remove the panic entirely. For example the formatting infrastructure should contain zero panics, but I doubt that's the case for today. Additionally I haven't figured out how to push on a Vec in wasm without having a branch with a panic, that'd be a great thing to fix!

Overall I feel like we shouldn't address the symptom here (panic bloat in a binary) but rather the cause (branches towards having a panic). That'll be a long-lasting solution and benefit basically all targets that Rust has.

gamozolabs commented 6 years ago

I slightly disagree with @alexcrichton here. While I always agree that we should minimize panics where they are not needed (there's no reason to not do this).

Further I agree that there should really be no way of removing panics. They are fundamental to the language and should be kept. However there needs to be a way of implementing a panic that just does something like core::intrinsics::abort() and there should be almost no panic cost (this is up to the optimizer to delete arguments that are unused, which it does fine with fat LTO).

However fundamentally the internals of Rust have panics which are formatted (that are required and cannot be removed). This is great, for normal use it's nice to see that you indexed X bytes out of bounds or something. However a single use of core::fmt is extremely expensive, even in the simplest case (see examples below, 1485 bytes increase in code size for a single panic with fat LTO and optimizations). Unless you can remove all panics, for each uniquely formatted type (str, padding, number, float, hex, etc) used in a panic there is going to be a tremendous amount of code introduced. For a moderate sized codebase that I have, this core::fmt .code section size increase is near 20 KiB, leading to me not being able to use the msg format at all in panic_fmt. I simply cannot touch it without going over code size requirements.

There has to be a solution beyond just reducing panic use, and I think it has to be generic. Not a case-by-case remove some panics here and some panics there.

I think there needs to be an optional panic routine that only takes static strings. This would be a nightmare for backwards compatibility as every panic would need to have a non-formatted alternative. However I think the solution is simpler. Just repr the actual format string. This is ugly, but if you are really that critical of code size then you have to make some sacrifices.

For example with a model like this for a bounds check panic you would get something like:

fn panic_static(msg: &'static str, file &'static str, line: u32, column: u32) -> ! {}

For something like a panic_bounds_check this msg string would be literally: index out of bounds: the len is {} but the index is {}. It's not as useful as having numbers, however it tells you what the issue is (out of bounds), requires no changes of internal Rust code or existing panics, and can easily be deduped by the compiler to only exist once in .data. Making multiple uses of this panic negligible in .data size, and have no cost in .code size as they won't bring in format routines.

Might not be the ideal panic message, but keep in mind without something like this I cannot get any message at all. I have to go entirely off of file and line (which honestly is pretty great for embedded work).

TL;DR: Unless you can avoid all non-&'static str formats (that's never going to happen), you're going to pay at least about 2 KiB in core::fmt code cost, and up to 30-40 KiB if you hit some more exotic format types. The cost is not due to the number of panics, but the number of unique panics. Once you've paniced with a numeric format once, it doesn't really cost anything to panic like that again (maybe 20-50 bytes for more .data and 10-20 bytes for the actual branch for the panic).


For example this code (built with rustc -C lto -O -Z thinlto=off -g main.rs):

#![no_std]
#![feature(lang_items, start, core_intrinsics)]

#[lang = "panic_fmt"]
#[no_mangle]
pub extern fn rust_begin_panic(msg: core::fmt::Arguments,
                               _file: &'static str,
                               _line: u32,
                               _column: u32) -> ! {
    unsafe {
        core::ptr::read_volatile(&msg);
        core::intrinsics::abort();
    }
}

#[lang = "eh_personality"]
#[no_mangle]
pub extern fn rust_eh_personality() {}

#[start]
#[no_mangle]
#[allow(non_snake_case)]
pub fn mainCRTStartup(_argc: isize, _argv: *const *const u8) -> isize
{
    panic!("WOO");
}

Has the code distribution:

image

However by changing this code to have a potential panic due to a bounds check this code massively changes:

#![no_std]
#![feature(lang_items, start, core_intrinsics)]

#[lang = "panic_fmt"]
#[no_mangle]
pub extern fn rust_begin_panic(msg: core::fmt::Arguments,
                               _file: &'static str,
                               _line: u32,
                               _column: u32) -> ! {
    unsafe {
        core::ptr::read_volatile(&msg);
        core::intrinsics::abort();
    }
}

#[lang = "eh_personality"]
#[no_mangle]
pub extern fn rust_eh_personality() {}

#[start]
#[no_mangle]
#[allow(non_snake_case)]
pub fn mainCRTStartup(_argc: isize, _argv: *const *const u8) -> isize
{
    let x = [1, 2, 3, 4];
    x[_argc as usize]
}

image

Just by panicing with a bounds check added 1485 bytes of code (not data).

-B

fitzgen commented 6 years ago

See also https://github.com/rust-lang-nursery/rfcs/issues/41

fitzgen commented 6 years ago

I don't think it's worthwhile to blanket shoot for "remove everything at all costs" because panics are actually quite useful when debugging and such.

I think that one should be able to create .wasm binaries completely stripped of panic and formatting infra, but also have a cargo feature that enables it for debugging and development.

RReverser commented 5 years ago

Looks like this should help quite a bit too: https://github.com/rust-lang/rust/issues/54981#issuecomment-443369450 (silent aborts, unfortunately also requires Xargo but it's a good step)

ashleygwilliams commented 5 years ago

@fitzgen i think we want to track this issue differently- i'm going to close (if i'm wrong tho please reopen and apologies!)

JeanMertz commented 5 years ago

@ashleygwilliams, could you let us know if/when this is being tracked in a different issue?

I am subscribed to this thread because I’m interested in whatever progress is being made on this, but maybe there is another way to stay in the loop?

sffc commented 4 years ago

It makes perfect sense that in a debug build, you should get the full panicking infrastructure, but in a release build, you should be able to enable a feature that replaces all panics with a free call to abort().

Has such a feature been made available since the last update on this issue thread?

MendyBerger commented 2 years ago

Has such a feature been made available since the last update on this issue thread?

Think this is what you were looking for https://github.com/rust-lang/rust/issues/54981#issuecomment-899917784

sffc commented 2 years ago

Yes, building std with panic_immediate_abort is a solution that now solves this issue.