Open davidpdrsn opened 3 years ago
After further testing it doesn't seem to be related to tracing-appender. I still happens if you add a default layer:
use tracing_subscriber::fmt::{self, Subscriber};
use tracing_subscriber::layer::SubscriberExt;
fn main() {
let subscriber = Subscriber::builder().finish();
let subscriber = subscriber.with(fmt::layer());
tracing::dispatcher::set_global_default(tracing::dispatcher::Dispatch::new(subscriber))
.unwrap();
let span = tracing::info_span!(
"request",
// this field will appear once, as expected
not_duplicate = 123,
// this field will appear twice
duplicate = tracing::field::Empty
);
// in my code this happens conditionally based on an env var
// but done unconditionally here for simplicity
span.record("duplicate", &123);
let _guard = span.enter();
tracing::info!("some event!");
}
Note that if the duplicate
field is not set to empty, it will appear three times.
let span = tracing::info_span!(
"request",
not_duplicate = 123,
duplicate = 1
);
span.record("duplicate", &123);
let _guard = span.enter();
tracing::info!("some event!");
This prints:
INFO request{not_duplicate=123 duplicate=1 duplicate=123 duplicate=123}: tracing_1372: some event!
I’m able to reproduce on current master. Here is the minimal reproducing example I have (based on the one from @davidpdrsn):
use tracing_subscriber::fmt;
use tracing_subscriber::subscribe::CollectExt;
use tracing_subscriber::util::SubscriberInitExt;
fn main() {
tracing_subscriber::registry()
.with(fmt::Subscriber::new())
.with(fmt::Subscriber::new())
.init();
let span = tracing::info_span!(
"request",
// this field will appear once, as expected
not_duplicate = 123,
// this field will appear twice
duplicate = tracing::field::Empty
);
span.record("duplicate", &123);
let _guard = span.enter();
tracing::info!("some event!");
}
Indeed:
2023-09-06T22:36:48.581319Z INFO request{not_duplicate=123 duplicate=123 duplicate=123}: tracing_test: some event!
2023-09-06T22:36:48.581374Z INFO request{not_duplicate=123 duplicate=123 duplicate=123}: tracing_test: some event!
Interestingly, when using the JSON formatter, the fields are NOT duplicated:
{"timestamp":"2023-09-06T22:34:12.286412Z","level":"INFO","fields":{"message":"some event!"},"target":"tracing_test","span":{"duplicate":123,"not_duplicate":123,"name":"request"},"spans":[{"duplicate":123,"not_duplicate":123,"name":"request"}]}
{"timestamp":"2023-09-06T22:34:12.286513Z","level":"INFO","fields":{"message":"some event!"},"target":"tracing_test","span":{"duplicate":123,"not_duplicate":123,"name":"request"},"spans":[{"duplicate":123,"not_duplicate":123,"name":"request"}]}
I was able to track down the issue.
When Span::record
is called, tracing_subscriber::fmt::Subscriber::on_record is called for each subscriber:
fn on_record(&self, id: &Id, values: &Record<'_>, ctx: Context<'_, C>) {
let span = ctx.span(id).expect("Span not found, this is a bug");
let mut extensions = span.extensions_mut();
if let Some(fields) = extensions.get_mut::<FormattedFields<N>>() {
let _ = self.fmt_fields.add_fields(fields, values);
return;
}
let mut fields = FormattedFields::<N>::new(String::new());
if self
.fmt_fields
.format_fields(fields.as_writer().with_ansi(self.is_ansi), values)
.is_ok()
{
fields.was_ansi = self.is_ansi;
extensions.insert(fields);
}
}
Note that because there are two fmt::Subscriber
s in the example, on_record
is called twice, as expected.
In the above excerpt, the FormattedFields
is suspicious. As suggested by the name, not only it is holding fields, it holds them in a formatted fashion. And indeed, FormattedFields
is defined like this:
pub struct FormattedFields<E: ?Sized> {
_format_fields: PhantomData<fn(E)>,
was_ansi: bool,
/// The formatted fields of a span.
pub fields: String, // <-- this is where the fields are stringified and concatenated together
}
The method add_fields
of the FormattedFields
is provided by the FormatFields
trait that is implemented on Pretty
and JsonFields
among others. The default implementation of add_fields
is simply inserting a white space if current.fields
is not empty, and delegates the actual formatting to the format_fields
method:
fn add_fields(
&self,
current: &'writer mut FormattedFields<Self>,
fields: &span::Record<'_>,
) -> fmt::Result {
if !current.fields.is_empty() {
current.fields.push(' ');
}
self.format_fields(current.as_writer(), fields)
}
By default, this ends up being actually formatted by the fmt::format::DefaultVisitor, that is simply writing / pushing into the underlying String
.
Note that if the
duplicate
field is not set to empty, it will appear three times.
As noted by @aym-v , there is no upper bound to the number of duplicates. For each call to add_fields
, the field gets concatenated again. When duplicate
is not set to empty initially, add_fields
is called once when the span is created, and twice when Span::record
is called, because there are two collectors, and the underlying buffer, FormattedFields::fields
is shared among all the collectors because FormattedFields
is an extension.
As mentioned above, behavior is different when the JSON formatter is used. The reason is that the JSON formatter, in order to keep the fields in a structured valid JSON format, parses the stringified fields
back into a BTreeMap<&str, serde_json::Value>
, adds the new field, and then serializes it back to JSON:
// If fields were previously recorded on this span, we need to parse
// the current set of fields as JSON, add the new fields, and
// re-serialize them. Otherwise, if we just appended the new fields
// to a previously serialized JSON object, we would end up with
// malformed JSON.
//
// XXX(eliza): this is far from efficient, but unfortunately, it is
// necessary as long as the JSON formatter is implemented on top of
// an interface that stores all formatted fields as strings.
//
// We should consider reimplementing the JSON formatter as a
// separate layer, rather than a formatter for the `fmt` layer —
// then, we could store fields as JSON values, and add to them
// without having to parse and re-serialize.
let mut new = String::new();
let map: BTreeMap<&'_ str, serde_json::Value> =
serde_json::from_str(current).map_err(|_| fmt::Error)?;
let mut v = JsonVisitor::new(&mut new);
v.values = map;
fields.record(&mut v);
v.finish()?;
current.fields = new;
(There is a note from @hawkw explaining this should be changed in the future for performance reasons.)
That’s why the JSON formatter does not exhibit the problem.
One avenue to fix the problem would be to clear the FormattedFields::fields
buffer and reformat all the fields again each time add_fields
is called. Or we could keep some sort of BTreeMap<Field, String>
that we could update as fields are added or updated. There are many variations possible to this.
I’m willing to fix the issue if a maintainer can clarify what is the desired solution to this problem.
cc @hawkw @davidbarsky
@CBenoit
One avenue to fix the problem would be to clear the
FormattedFields::fields
buffer and reformat all the fields again each timeadd_fields
is called.
I don't think this approach will work, as it would require access to all of the fields when add_fields
is called. Fields recorded when the span is created will no longer be retained by the subscriber, so they cannot be re-formatted. So, I don't think this is possible.
Or we could keep some sort of
BTreeMap<Field, String>
that we could update as fields are added or updated. There are many variations possible to this.
I'd prefer to avoid the performance cost of storing fields in a map, although this does have the advantage of allowing fields to be replaced when re-recorded.
A potential approach could be to add some kind of counter of fmt::Subscriber
s in a static AtomicUsize
that's incremented whenever a new fmt::Subscriber
is constructed, and stored by the layer. The value of this counter can be added to the FormattedFields
type and used to identify which Subscriber
"owns" the formatted fields. If a Subscriber
's ID doesn't match that of the FormattedFields
when they are looked up, that means they were inserted by a different Subscriber
of the same type. If that's the case, the subscriber can skip calling add_fields
because the "owning" subscriber has already recorded those fields.
I'm not sure if this is the best solution, but it's a possible one, and it avoids allocating a bunch of small strings for formatted field values that can't easily be pooled (which is an issue with the BTreeMap
or HashMap
solutions).
@hawkw
One avenue to fix the problem would be to clear the
FormattedFields::fields
buffer and reformat all the fields again each timeadd_fields
is called.I don't think this approach will work, as it would require access to all of the fields when
add_fields
is called. Fields recorded when the span is created will no longer be retained by the subscriber, so they cannot be re-formatted. So, I don't think this is possible.
Indeed, I was thinking about cloning and holding the Field
s in FormattedFields
to circumvent this and defer the actual formatting (caching the result for performance). I didn’t investigate the cost of this though… and I might have not realized yet it’s impossible :laughing:
Or we could keep some sort of
BTreeMap<Field, String>
that we could update as fields are added or updated. There are many variations possible to this.I'd prefer to avoid the performance cost of storing fields in a map, although this does have the advantage of allowing fields to be replaced when re-recorded.
Yes, and my understanding is that we should support re-recording of fields. I don’t like this solution either, but it works for all intents and purposes.
A potential approach could be to add some kind of counter of
fmt::Subscriber
s in astatic AtomicUsize
that's incremented whenever a newfmt::Subscriber
is constructed, and stored by the layer. The value of this counter can be added to theFormattedFields
type and used to identify whichSubscriber
"owns" the formatted fields. If aSubscriber
's ID doesn't match that of theFormattedFields
when they are looked up, that means they were inserted by a differentSubscriber
of the same type. If that's the case, the subscriber can skip callingadd_fields
because the "owning" subscriber has already recorded those fields.I'm not sure if this is the best solution, but it's a possible one, and it avoids allocating a bunch of small strings for formatted field values that can't easily be pooled (which is an issue with the
BTreeMap
orHashMap
solutions).
This approach partially fixes the problem but does not support modifying an already formatted field.
Because Span::record
can be used to record a new value for an already existing field and may be called more than once, we’ll still get duplicates when updating the value for a field already inserted.
I have a program where I update a specific field to reflect a state changing with time, and this approach would be a problem.
That being said, it’s still better than no mitigation at all and would unlock a few use cases.
it avoids allocating a bunch of small strings for formatted field values that can't easily be pooled
For what it’s worth, smol_str
’s SmolStr
could be used to mitigate this specific problem by inlining strings up to 23 bytes long.
I have a program where I update a specific field to reflect a state changing with time, and this approach would be a problem.
In general, I do not recommend using record
in this manner: I would instead emit an event with the new value. A large number of subscribers—including some pretty popular ones—would not be able to emit a new span with the updated value.
I have a program where I update a specific field to reflect a state changing with time, and this approach would be a problem.
In general, I do not recommend using
record
in this manner: I would instead emit an event with the new value. A large number of subscribers—including some pretty popular ones—would not be able to emit a new span with the updated value.
Fair enough. Should Span::record
documentation be updated to warn about this?
Given that updating a field’s values is discouraged, @hawkw’s suggestion is a good enough approach. I’ll look into it.
I have a program where I update a specific field to reflect a state changing with time, and this approach would be a problem.
In general, I do not recommend using
record
in this manner: I would instead emit an event with the new value. A large number of subscribers—including some pretty popular ones—would not be able to emit a new span with the updated value.
So I'm running into this too, and my issue isn't that the value I'm recording changes over time - merely that I don't have the value for some field until later, and it's difficult for me to place the record
call in a place that I can be sure will only be called once. This is I suspect a semi-common situation that would be nice to have a solution to.
We ran into this as well. A workaround is to use a different type of FormatFields
for each layer to force them to record the fields into different span extensions. A newtype like this works well for us:
#[derive(Default)]
struct TypedFields(DefaultFields);
impl<'writer> FormatFields<'writer> for TypedFields {
fn format_fields<R: RecordFields>(
&self,
writer: Writer<'writer>,
fields: R,
) -> std::fmt::Result {
self.0.format_fields(writer, fields)
}
}
Note it's not ideal because each fields is still recorded twice even though it's only needed once. But at least it appears only once in each layer's output.
Bug Report
Version
Then:
cargo tree | grep tracing
:Platform
uname -a
:Description
It seems when using
tracing-append
together with span fields that are initiallyEmpty
and then get added viaSpan::record
the span field appears twice in the output.Example:
This prints:
I suppose the event is printed twice because there are two stdout writers so that part is expected. However the field
duplicate
also appears twice which is probably a bug.If you remove the stdout writer everything works as normal.