Closed object closed 2 years ago
I'll write up a longer response to this, but there are definitely ways to accomplish this in Phobos 1.x through:
And in Phobos 2.x you can do both of those plus use OpenTelemetry processors to filter out traces before they're exported to a service like HoneyComb. I'll get to work writing up a more detailed tuning guide on this subject.
Thank you. I wasn't aware of either trace filtering or toggle for specific actors. Will appreciated some details on how to achieve this.
Here's the official literature we have so far: https://phobos.petabridge.com/articles/performance.html#performance-optimization-best-practices
But I can probably go deeper on that
Great, that's the good start for me.
Trying to get filtering to work, but still get unwanted messages. This is what I did:
builder.WithTracing (fun b ->
b.SetTracer(tracer) |> ignore
b.AddIncludeMessageFilter<Nrk.Oddjob.Core.ShardMessages.MediaSetShardMessage>() |> ignore
b.AddIncludeMessageFilter<Nrk.Oddjob.Ps.PsShardMessages.PsShardMessage>() |> ignore
b.AddIncludeMessageFilter<Nrk.Oddjob.WebApi.HealthCheck.Actors.UpdateSubcomponentState>() |> ignore)
This is supposed to filter out all message except for the three explicitly specified types above, right? But I still see other messages, both from out domain and Akka system messages (e.g. Akka.Cluster.ClusterHeartbeatSender+Heartbeat) logged.
Hmm, tried AddIncludeMessageFilter, AddExcludeMessageFilter and AddMessageFilter (with ExcludeTypeFilterRule and IncludeTypeFilterRule), but I am still getting various message types logged.
Is the following understanding correct?
What happens if I try the combination of two? Anyway, looks like I can't get any filtering at all.
So the filtering APIs have been, IMHO, intuitive at best and I'd like to simplify what we're doing there. Are you working against Phobos 2.0? I have more leeway to make major changes there and I think I can follow some of the instrumentation patterns that other OTel libraries are doing to accomplish the same goal. I'd be happy to prototype something today and show you what it looks like here.
I am still on Phobos 1.x. I am working on an OTel proof of concept to demo something tomorrow so I guess I won't be able to switch to Phobos now. So I appreciate a hint about what can be wrong with my filtering. But I can switch to Phobos 2.0 next week, if you think it will work better in that respect.
Right now I don't seem to get any filtering to work.
I'll take a look at the 1.x bits right now then.
So the way the filters work:
.IncludeMessagesAlreadyInTrace(false)
on the builder. This drops a lot of trace traffic when this happens.Can you try .IncludeMessagesAlreadyInTrace(false)
and see if that helps?
Thanks, will do.
In the meantime, I'm still going to look at improving this as none of this is intuitive.
@Aaronontheweb I see that the number of irrelevant messages has drastically reduced. With a few exceptions, I only see messages from Include filter. Example of a message that is still logged is Akka.Cluster.Tools.PublishSubscribe.Internal.Status that is published by /system/distributedPubSubMediator (Akka.Cluster.Tools.PublishSubscribe.DistributedPubSubMediator).
@object that's great to hear - so I just found a bug in Phobos based on your feedback here:
/// <summary>
/// INTERNAL API
/// Used to implement filter rules.
/// </summary>
internal static class ClusterCustomFilteringRules
{
public static readonly ImmutableHashSet<IFilterRule> DistributedPubSubFilter =
ImmutableHashSet<IFilterRule>.Empty.Add(new IncludeFunctionFilterRule(o =>
{
switch (o)
{
// covers the bulk of useful messages
case IDistributedPubSubMessage i when i is IWrappedMessage:
case Subscribe _:
case SubscribeAck _:
case Unsubscribe _:
case UnsubscribeAck _:
case Put _:
case Remove _:
return true;
default:
return false;
}
}));
}
We implemented this but..... It is not currently being used inside any of the filters 🤦
We'll push an update which includes this
In theory your filter should already cover this without this filter - so there's likely another issue with the DistributedPubSub
filtering specifically that I need to look into.
Ok, looks like @Arkatufus has patched this - we will push a new Phobos release in the next day or two.
Update on this - @Arkatufus did indeed patch the DistributedPubSub
issue but I decided that our filtering system, generally, isn't robust enough because it doesn't do the following things well:
true
for the message to be traced today.I have a draft PR open right now that makes some breaking changes to the filtering system in order to rectify this.
This code is all for the 1.x branch of Phobos but it will look similar for 2.x. We will release this update for both versions concurrently.
var phobosConfig = new PhobosConfigBuilder()
.WithTracing(m =>
{
m.SetTracer(tracer);
// don't include messages in trace that don't satisfy other filters
m.VerboseTracing(false); // this setting is new
m.IncludeMessagesAlreadyInTrace(false);
// only accept FilteredMessage types
m.AddIncludeMessageFilter<FilteredMessage>();
});
Using the current filtering API, this code means "don't trace any messages under any circumstances unless they implement the FilteredMessage
base class."
var phobosConfig = new PhobosConfigBuilder()
.WithTracing(m =>
{
m.SetTracer(tracer);
// don't include messages in trace that don't satisfy other filters
m.VerboseTracing(false);
m.IncludeMessagesAlreadyInTrace(true); // set to true this time
// only accept FilteredMessage types
m.AddIncludeMessageFilter<FilteredMessage>();
});
Using the current filtering API, this code means "don't trace any messages under any circumstances unless they implement the FilteredMessage
base class OR they are already part of a trace."
Finally, if the builder pattern isn't easy enough to use we're going to offer the option to replace it altogether with a single ITraceFilter
implementation:
/// <summary>
/// Each <see cref="ActorSystem"/> can implement exactly one of these to control
/// the decision-making around which traces to include and which ones to exclude.
///
/// When this type is configured, it will replace all of the <see cref="IFilterRule"/>s
/// that have been previously defined.
/// </summary>
/// <seealso cref="FilterDecision"/>
public interface ITraceFilter
{
/// <summary>
/// Evaluates a message processed by an actor and determines whether or not to include it in the current
/// or a new trace.
/// </summary>
/// <param name="message">The message currently being processed by an actor.</param>
/// <param name="alreadyInTrace">Whether or not this message is already part of an active trace.</param>
/// <returns><c>true</c> if we are going to trace the message, <c>false</c> otherwise.</returns>
/// <remarks>
/// Should return <c>true</c> unless you are explicitly trying to filter a given message out of your trace.
/// </remarks>
bool ShouldTraceMessage(object message, bool alreadyInTrace);
}
All of the filtering context is passed in via the method and all of the user-defined filtering rules exist in a single location for this ActorSystem
, structured in exactly the order implemented by the user. None of the settings from the builder pattern apply when a custom ITraceFilter
is used.
I should have a small demo video of this done soon that shows it in action.
This looks good and reasonable @Aaronontheweb. I am already on Phobos 2, so once you have a beta that supports it, I can give it a try.
Hi @object - we have pushed new Phobos 2.0 binaries that have an API change that should nip this problem in the bud.
Please see the release notes here and follow the configuration instructions - I think this should help quite a bit: https://phobos.petabridge.com/articles/releases/RELEASE_NOTES.html#200-beta4httpssdkbincompublisherpetabridgeproductphobospackagesphobosactorversions200-beta4-and-150-beta1httpssdkbincompublisherpetabridgeproductphobospackagesphobosactorversions150-beta1-march-17th-2022
Hi @object , have you been able to give this a try yet?
@Aaronontheweb Sitting with a test project now, will give you an update today or tomorrow.
@Aaronontheweb What happens with the intermediate spans that don't satisfy filter conditions?
Scenario 1. "don't trace any messages under any circumstances unless they implement the FilteredMessage base class."
We have a transaciton that consists of spans [FilteredMessage1; NotFilteredMessage; FilteredMessage2] The transaction contains NotFilteredMessage that should be filtered out. Will NotFilteredMessage in the middle terminate the transaciton so only the first FilteredMessage will be seen? If not, then how the transaction will be shown in an APM dashboard? Or there will be two separate transactions [FilteredMessage1] and [FilteredMessage2]?
Scenario 2. "don't trace any messages under any circumstances unless they implement the FilteredMessage base class OR they are already part of a trace."
A: [FilteredMessage1; NotFilteredMessage:; FilteredMessage2] For 2.A all messages should be included. B: [NotFilteredMessage; FilteredMessage1; FilteredMessage2] What happens here? Will the transaction start from FilteredMessage1 and include [FilteredMessage1; FilteredMessage2]? C: [NotFilteredMessage1; FilteredMessage1; NotFilteredMessage2; FilteredMessage2] Will the transaction include [FilteredMessage1; NotFilteredMessage2; FilteredMessage2]?
I tried to use Phobos-instrumented system with Honeycomb APM and quickly got into a problem with rate limit. Honeycomb sets maximum rate of 2000 events per second for a dataset (perhaps higher rate for Enterprise account, I didn't check) and our application exceeded that. While this is understandable because each incoming request results in a complex processing workflow across the Akka cluster nodes, the traffic in fact is relatively low.
I wonder whether you think Phobos tracing can be extended with activity/actor filtering so an Akka application can be configured to to sample just subsets of actors/activities. I haven't thoroghly thought about filtering principles, just trying to figure out how to deal with challenges like described above, when Phobos traces hit certain rate limit set by the APM provider.