microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.87k stars 2.94k forks source link

[Feature Request] Implement provider destruction notification mechanism (e.g: IExecutionProvider::OnSessionEnd() ) #22970

Open Hopobcn opened 4 days ago

Hopobcn commented 4 days ago

Describe the feature request

I am implementing profiling for a new Execution Provider, this new provider traces contain information about the memory allocations/deallocaitons performed in the accelerator (among other events). Problem is, due current destruction ordering in InferenceSession::~InferenceSession() the tracing session may be ended before 1) sessionstate is wiped and 2) execution providers are destroyed. This proposal would address (2).

Currently, Execution Providers have no opportunity to get notified that the session will finish to perform cleanup before profiling session finishes.

Would it be acceptable to add

  /**
     Called when session is destroyed
     This provides an opportunity for execution providers to optionally synchronize and
     clean up its temporary resources to reduce memory before the session is destroyed.
   */
  virtual common::Status OnSessionEnd() { return Status::OK(); }

to IExecutionProvider

and refactor ~InferenceSession() destructor accordingly?

InferenceSession::~InferenceSession() {
  ORT_TRY {
    for (auto& xp : execution_providers_) {
      auto end_status = xp->OnSessionEnd();
      if (!end_status.IsOK()) {
       LOGS(*session_logger_, ERROR) << end_status.ErrorMessage();
      }
    }
  }
  ORT_CATCH(const std::exception& e) {
    // TODO: Currently we have no way to transport this error to the API user
    ORT_HANDLE_EXCEPTION([&]() {
      LOGS(*session_logger_, ERROR) << "Error during ~InferenceSession(): " << e.what();
    });
  }
  ORT_CATCH(...) {
    LOGS(*session_logger_, ERROR) << "Unknown error during ~InferenceSession()";
  }

I can contribute with the change if necessary.

Describe scenario use case

Improve information contained in traces by informing ExecutionProviders that the session will end, letting them perform optional cleanups before the tracing session ends.

skottmckay commented 3 days ago

Is this something that needs to be a long-term part of the Execution Provider API, or something that you can temporarily add locally for this sort of testing?

Hopobcn commented 3 days ago

For the use case i'm presenting I agree that this could be implemented as an optional behavior tied to whether profiling/tracing is enabled. However, I believe that providing such a notification (e.g., OnSessionEnd) would serve a broader purpose beyond just my immediate use case. It would give Execution Providers a chance to perform optional cleanup or synchronization tasks when the session is being destroyed, which can be valuable in other scenarios as well.

In my case, this mechanism would allow the Execution Provider to finalize its profiling traces properly, ensuring they include all relevant events (such as memory deallocation) that occur during the destruction of InferenceSession. Without this notification, some important cleanup or finalization steps might be performed outside of the tracing scope giving a sensation of leaking resources.

By making OnSessionEnd part of the API but keeping its implementation optional, Execution Providers that do not need this feature could simply rely on the default no-op implementation. This would ensure backward compatibility while allowing providers that need it to implement this functionality.