tmds / Tmds.LinuxAsync

MIT License
25 stars 3 forks source link

Add tracing #53

Open antonfirsov opened 4 years ago

antonfirsov commented 4 years ago

Shouldn't we add tracing to the code?

For me it would definitely help to get a better overview of which code paths gets executed. Eg. I'm super curious to see the ratio of synchronous completions.

We could make the tracing calls [Conditional(...)] to avoid unwanted perf. impacts.

tmds commented 4 years ago

@adamsitnik and I have talked about that too. Yes, it would be good to get more insights and it can help reveal bugs too.

adamsitnik commented 4 years ago

I've recently tried to add some performant tracing using EventSource, however I was unable to tell PerfCollect to listen for given events.

Sample code:

[EventSource(Name = "SocketsEventSource")]
internal class SocketsEventSource : EventSource
{
    public class Keywords
    {
        public const EventKeywords Epoll = (EventKeywords)1;
        public const EventKeywords Aio = (EventKeywords)2;
    }

    [Event(1, Level = EventLevel.LogAlways, Keywords = Keywords.Epoll)]
    public void EpollWaitReturned(int eventsCount) => WriteEvent(1, eventsCount);

    [Event(2, Level = EventLevel.LogAlways, Keywords = Keywords.Aio)]
    public void AioBatch(int batchCount) => WriteEvent(2, batchCount);

    public static SocketsEventSource Log = new SocketsEventSource();
}

@brianrob how can I enable certain events using PerfCollect and lttng?

antonfirsov commented 4 years ago

@adamsitnik I'm considering writing to stream in the first implementation, to avoid dealing with ETW issues.

With a good API we can switch the backend to EventSource easily.

@tmds any specific expectations towards the implementation you'd like to share?

adamsitnik commented 4 years ago

But how are you going to view the events?

EventSource logging adds some extra complexity, but once we figure out how to properly enable it via PerfCollect, it's super fast and we can view all the events in PerfView (and take advantage of all its features)

antonfirsov commented 4 years ago

Ok, I will dig deeper and focus on implementing a full-scale EventSource based solution, hopefully by Tuesday evening.

Hope it will help us to figure out what went wrong with io_uring.

antonfirsov commented 4 years ago

I managed to capture events with dotnet-trace, but I don't have any nice output, beacuse EventID-s do not translate to method names unlike when laumching with PerfView on windows: image

When coming to perfcollect, I'm completely lost. I can't get it work on my VM since I don't have perf matching my kernel version. I have no idea how and why does it work when running within the benchmarks-server container. Even when it works, it seems that the only option to get SocketsEventSource logged is to alter the script.

brianrob commented 4 years ago

@adamsitnik perfcollect is harder to use with EventSource because LTTng doesn't allow for dynamic registration of tracepoints. Instead, all EventSources emit via a single tracepoint called EventSource. Then you can control exactly which events are emitted via an environment variable. It's really not ideal.

dotnet-trace on the other hand should let you do exactly what you want pretty easily. @antonfirsov, the fact that you weren't able to get event names to show up is odd - this should just work. Can you share a reference to the EventSource implementation, and if possible, the trace file?

antonfirsov commented 4 years ago

@brianrob opened dotnet/diagnostics#884 and shared details.

Then you can control exactly which events are emitted via an environment variable. It's really not ideal.

What environment variables should we set up given an event source TestEventSource?

tmds commented 4 years ago

@tmds any specific expectations towards the implementation you'd like to share?

Add what makes sense in your opinion. @adamsitnik and I will give feedback on the PR.

Hope it will help us to figure out what went wrong with io_uring.

If it is a functional issue, it should show in the event source tracing. If it a performance bottleneck, we are more likely to identify it in the perfview traces Adam has collected.

What environment variables should we set up given an event source TestEventSource?

I think it may be COMPlus_EventPipeConfig. @antonfirsov fyi, I wrote a blog post about tracing on Linux: https://developers.redhat.com/blog/2019/12/23/tracing-net-core-applications/.

@adamsitnik can you set an envvar while you run the TE benchmarks?

antonfirsov commented 4 years ago

I think it may be COMPlus_EventPipeConfig

perfcollect is based on LTTng and does not use EventPipe AFAIK, so I think this won't help collecting traces on aspnet-pef-lin

If it is a functional issue, it should show in the event source tracing. If it a performance bottleneck, we are more likely to identify it in the perfview traces Adam has collected.

If the internal behavior deviates from what we assume it may affect performance. Maybe it's just me, but I was unable to track down the actual execution path for 100% from the traces, because too many calls are missing from the call tree (what is the reason? it's unlikely think they all get inlined.).

tmds commented 4 years ago

@antonfirsov focussing on the what:

It would be nice to know how many events we get per epoll_wait:

https://github.com/tmds/Tmds.LinuxAsync/blob/f7394541eb689080974820c5f456f3aabfd1b3e8/src/Tmds.LinuxAsync/EPollAsyncEngine.EPollThread.cs#L72

And how many times we call epoll_wait in a benchmark.