llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.22k stars 11.65k forks source link

Profiling support in OrcV2 / JITLink #30869

Open lhames opened 7 years ago

lhames commented 7 years ago
Bugzilla Link 31521
Version trunk
OS All
CC @anarazel,@pitrou,@weliveindetail

Extended Description

We need to be able to profile the performance of JIT'd code. This bug should serve as an umbrella for ORC clients and developers to discuss ORC profiling support.

lhames commented 5 years ago

Yep. The aim is to be able to write profiling support as an ObjectLinkingLayer::Plugin subclass. Ditto for debugging support.

Does this mean that you currently expect JITEventListener based profiling support to be broken by your changes? Or just that it'd be more efficient to do so via ObjectLinkingLayer::Plugin?

As Stefan mentioned: RuntimeDyld and RTDyldObjectLinkingLayer continue to be supported, and they will continue to support JITEventListeners. RuntimeDyld/RTDyldObjectLinkingLayer will not be replaced until JITLink and ObjectLinkingLayer can provide a truly viable alternative.

As for the new listener/plugin interfaces: JITLink allows rich interaction with the linker's data structures, which RuntimeDyld did not. I don't know that it will make plugins more efficient, but I think it will allow a wider range of plugins, and allow JITLink to generate more efficient code, while keeping the plugin mechanism efficient.

My hope is that it will be possible to write a JITEventListener wrapper for backwards compatibility. The big caveat will be dead-stripping/atom layout: JITLink deletes unreachable code and reorganizes section contents. That will break any JITEventListener that expects the relocated section content to line up exactly with the section content in the object file. We might be able to address this by making dead/stripping and layout optional (or pluggable, and provide variants that mimic the object layout).

lhames commented 5 years ago

Ok nice, is LocalEHFrameRegistrationPlugin a good role model here for subclassing ObjectLinkingLayer::Plugin? https://github.com/llvm/llvm-project/blob/a2fbe2bc/llvm/include/llvm/ ExecutionEngine/Orc/ObjectLinkingLayer.h#L142

Yes it is. The plugin API is new though, so it might not provide everything you need to know (e.g. you only get access to the AtomGraph so far, not the underlying buffer/object). There's plenty of scope for us to tweak the API at the moment, since there are few clients.

weliveindetail commented 5 years ago

Does this mean that you currently expect JITEventListener based profiling support to be broken by your changes?

It still works with RuntimeDyld, using Orc's RTDyldObjectLinkingLayer. It does NOT YET work with the new JITLink, using Orc's new ObjectLinkingLayer.

anarazel commented 5 years ago

Yep. The aim is to be able to write profiling support as an ObjectLinkingLayer::Plugin subclass. Ditto for debugging support.

Does this mean that you currently expect JITEventListener based profiling support to be broken by your changes? Or just that it'd be more efficient to do so via ObjectLinkingLayer::Plugin?

weliveindetail commented 5 years ago

Ok nice, is LocalEHFrameRegistrationPlugin a good role model here for subclassing ObjectLinkingLayer::Plugin? https://github.com/llvm/llvm-project/blob/a2fbe2bc/llvm/include/llvm/ExecutionEngine/Orc/ObjectLinkingLayer.h#L142

lhames commented 5 years ago

Yep. The aim is to be able to write profiling support as an ObjectLinkingLayer::Plugin subclass. Ditto for debugging support.

weliveindetail commented 5 years ago

This comes up again for JITLink. I had a quick-fix, but due to intermediate changes it didn't fit in anymore: https://reviews.llvm.org/D61065

Correct me if I am wrong, but I think the plan is to:

Maybe the functionality can be encapsulated in a utility class.

anarazel commented 6 years ago

See https://reviews.llvm.org/D44890 and also https://reviews.llvm.org/D44892

anarazel commented 6 years ago

I've used out-of-tree patches for this for a while. I'm not quite sure what changes you exactly had in mind, but I'm going to open a review with what I have, and then we can go from there?

(waiting for a recompile to open a phab review)

lhames commented 7 years ago

ExecutionEngine and MCJIT currently support OProfile and Intel profiling via the JITEventListener interface - it should be easy to adapt that code to work with ObjectLinkingLayer's callbacks (NotifyLoaded and NotifyFinalized), allowing easy migration for existing clients. If anyone wants to dive on this please feel free (either file a new bug blocking this, or just make notes inline here). Otherwise I'll get to it when I can.

lhames commented 1 year ago

Related issue: @lucasreis1 has been seeing some issues with the existing perf support: https://github.com/llvm/llvm-project/issues/58174.

lhames commented 1 year ago

@pchintalapudi is looking at making ELF debug sections available in the LinkGraph (they were previously skipped during ELF LinkGraph construction). That's step 1 here, since the perf event listeners all need to read debug info.

The next question is what metadata do we need, and in what form? E.g. Should we just dump the original object file to disk? We could make the original object file available as a section in the graph to facilitate that, but what would we do for LinkGraphs created directly via the LinkGraph APIs? Or should the profiling support plugin synthesize a new object file from the sections in the Graph? That's my initial preference, but I wonder how much work the object synthesizer will be.

Finally there's the question of where the metadata massaging should happen (controller or executor). The registration (and deregistration) itself should definitely happen on the executor side, so that will need to be implemented in the ORC runtime.

llvmbot commented 1 year ago

@llvm/issue-subscribers-jitlink

vchuravy commented 1 year ago

x-ref: #60883

Once we settled the Perf side of things we will have to do the same for VTunes/ITTAPI (cc: @ekovanova & @abrown)

vchuravy commented 1 year ago

Finally there's the question of where the metadata massaging should happen (controller or executor). The registration (and deregistration) itself should definitely happen on the executor side, so that will need to be implemented in the ORC runtime.

At least for Perf the tools are all looing at the mapped files of the process being profiled (and the vma must be valid in that process) so my understanding is that the write to the mmap file need to happen on the executor side.

llvmbot commented 1 year ago

@llvm/issue-subscribers-julialang

vchuravy commented 1 year ago

The PR for initial perf integration is https://reviews.llvm.org/D146169

mgood7123 commented 11 months ago

The big caveat will be dead-stripping/atom layout: JITLink deletes unreachable code and reorganizes section contents. That will break any JITEventListener that expects the relocated section content to line up exactly with the section content in the object file. We might be able to address this by making dead/stripping and layout optional (or pluggable, and provide variants that mimic the object layout).

for debugging or for profiling?

compilers including clang can emit optimized code that can be profiled relatively fine although ??? symbols will always appear in various places due to optimizing and lack of debug info

vchuravy commented 6 months ago

VTune support landed in https://github.com/llvm/llvm-project/pull/83957