Veetaha commented 1 year ago

Panicking in std::fmt::Display implementations is a glaring footgun. Here is an example, where you are guaranteed to spend several hours debugging it, and potentially you will never find a cause because the panic happens in production deeply inside the logging system, which is your eyes, but they are now blind. Also, you won't notice this during development, because this log statement occurs in some obscure error-handling code which is hit 0.00001% of the time:

use std::collections::HashMap;
use tracing_subscriber::prelude::*;
use itertools::Itertools;

fn main() {
    let (loki, _join_handle) = tracing_loki::layer(
        "http://loki:3100".parse().unwrap(),
        HashMap::new(),
        HashMap::new(),
    )
    .unwrap();

    tracing_subscriber::registry()
        .with(tracing_subscriber::fmt::layer())
        // Commenting out the following line fixes the panic, but it's not obvious.
        // Basically any other layer that runs the `fmt::Display` implementation
        // of the `Itertools::format` second time after `fmt` subscriber may suffer from this panic
        .with(loki)
        .init();

    std::panic::set_hook(Box::new(move |_| {
        tracing::error!("panic happened");
    }));

    // panics here deep inside tracing when several layers display the value
    tracing::info!(field = %[1, 2, 3].iter().format(","), "Log event");

    eprintln!("This is never hit, and the panic `error!()` is never printed");
}

With this you'll see only this log output without any indication of panic:

Solution

Don't panic in std::fmt::Display implementations. Instead, the best approach would be to format a message with a lot of CAPS letters that say what the problem there is in the rendered message. Or panicking behavior could be made conditional upon a feature flag or maybe some other way of configuration, like a global variable or setup function

Philippe-Cholet commented 5 months ago

This definitely is a footgun. We panic in two Display implementations (only): Format (#307) and FormatWith.

307 mentions the possibility of adding a `Clone` bound but that would be a nasty breaking change.

@phimuemue I would be in favor of formatting an error message rather than panicking.

phimuemue commented 5 months ago

I agree it is concerning that our Display implementation can panic. But is an error message upon the second try really better? I would find an error message for Debug okish (its documentation hints at the fact that Debugs output is not stable), not so much for Display (which is for user-facing output):

I think that people rely on the output of Display (I certainly do) and are better off if they know when there's something strange instead of having an arbitrary error message (that might even change over time) that causes problems down the line. In my book, returning the error message is simply incorrect for Display.

So, we could rename it to lazy_but_possibly_panicking_format, or deprecate it and only keep something that converts to String once.

Possibly off-topic:

I am unsure, but may the general problem the following: Display::fmt takes &self, but there apparently are things that probably need self to be displayed. If Display was actually this:

trait HypotheticalDisplay {
 fn fmt(self, f: &mut Formatter) -> Result<(), Error>;
}

Then, you could - theoretically, and possibly using specialization - impl<'a> HyoptheticalDisplay for &'a DisplayableByRef vs impl HypotheticalDisplay for OurIteratorFormatType. Unsure if this would be a good idea, and I don't remember having seen it anywhere.

Philippe-Cholet commented 5 months ago

I see your point on incorrectness for Display.

Would it help enough to panic with a big CAPS error message then? For the user to easily find it in a possibly large log?

RUNTIME ERROR: `itertools::format` MUST BE FORMATTED ONLY ONCE, THE DEVELOPPER BROKE THE SPACE-TIME CONTINUUM

lazy_but_possibly_panicking_format would be quite long and alarming. Rename format_once would make the intent clearer but would not necessary help people as it might not be directly clear that the logger format the damn thing more than once.

We are not using const-generics yet so use specialization seems like an eternity to me at the moment. But it would be nicer!

Veetaha commented 5 months ago

We have to make a judgement call here. We must admit there isn't a perfect solution for this problem. We just have to pick our poison. I already personally consider format discouraged. Instead, I'm using join(sep) -> String instead to be safe.

I don't think format should be marked as #[deprecated], because it indeed has its niche for code that wants to avoid allocating a string. However, I'd rather put big warning emojis (⚠️ ⚠️ ⚠️) at the top of its documentation saying that the method is panic-unsafe and hard to debug. The docs should describe the potential footgun of its usage in tracing macros, which may format the same log field twice. In the example that I provided a mere addition of new tracing layer makes the object passed to a log macro formatted twice. I think when people see this in the docs they'll be warned enough and use join() instead to be safe if their code isn't performance-sensitive.

Philippe-Cholet commented 5 months ago

So merely update the docs, easy enough. No CAPS error message?

Veetaha commented 5 months ago

No CAPS error message?

If you mean formatting a CAPS message as output of display instead of panicking, that would also be cool, but I also see the point in producing a hard panic that would be much more visible than that (unless you use tracing in panic handler), and I'm sure there will be issues from people complaining that they missed the message. If I were able to re-do format from day zero, I'd avoid a panic, but today people probably expect it to panic already, so let's keep it this way

Philippe-Cholet commented 5 months ago

Oops, I meant panicking with an error message in CAPS like "`itertools::Format` WAS ALREADY FORMATTED ONCE". Would it help debugging if it comes to it?

Veetaha commented 5 months ago

Caps letters would be more visible indeed, but it wouldn't help in the case that I reported in my issue where the panic message was totally not visible because tracing was used in the panic handler. Anyway, I think panic itself is visible enough when its message is printed at all. Making it CAPS would probably be excessive as for me

rust-itertools / itertools

Revisit panicking in `std::fmt::Display` implementations as a footgun #667

Solution

307 mentions the possibility of adding a `Clone` bound but that would be a nasty breaking change.

rust-itertools / itertools

Revisit panicking in `std::fmt::Display` implementations as a footgun #667

Solution

307 mentions the possibility of adding a Clone bound but that would be a nasty breaking change.

307 mentions the possibility of adding a `Clone` bound but that would be a nasty breaking change.