zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.71k stars 6.54k forks source link

Create an API for Single Event Upset (SEU) Driver Devices #67493

Open nbalabak opened 9 months ago

nbalabak commented 9 months ago

Introduction

Add a kernel device driver API for interacting with Single Event Upsets (SEU) driver devices.

Draft PR: 67097

Problem description

The SEUs can occur due to radiation particles affecting memory, leading to data corruption or system errors. Agilex FPGA processors consist of three subsystems: SDM, HPS, and FPGA. The SDM subsystem includes firmware, while the HPS subsystem is equipped with Zephyr. The SDM (Secure Device Mananger) is responsible for detecting SEU errors within the system and initiating an interrupt from the SDM to the HPS (Hard Processor System). This driver provides functions to detect SEUs via Interrupt from SDM and report errors to the user via using Mailbox commands from HPS(Hard Processor System) to Secure Device Mananger. In addtion driver provide API's to inject errors in the system.

There is currently no purpose-built API in Zephyr for interacting with Single Event Upsets (SEU) driver devices.

Proposed change

Zephyr now includes a newly developed API, and corresponding APIs have been introduced to interface with the SEU driver.

APIs for the SEU driver have users to register callback functions and inject errors.

Driver Typical Workflow

  1. Register a callback function that specifies the required error mode. This registration will return a unique client number.
  2. When an error detection event occurs, the driver will automatically trigger the registered callback function.
  3. To simulate an error, you can use the error injection API provided by the driver.
  4. Deregister a callback function that specifies the required error mode.

Callback Function Implementation Requirement:

  1. The user must provide a callback function. When an error occurs, this callback function will be invoked, providing it with error information data.

Driver Examples

The code snippet below serve as examples of how a client of the API would interface.

<snip>
static const struct seu_api api = {
    .seu_callback_function_register = intel_socfpga_seu_callback_function_register,
    .seu_callback_function_deregister = intel_socfpga_seu_callback_function_deregister,
    .insert_safe_seu_error = intel_socfpga_insert_safe_seu_error,
    .insert_seu_error = intel_socfpga_insert_seu_error,
    .insert_ecc_error = intel_socfpga_insert_ecc_error,
    .read_seu_statistics = intel_socfpga_read_seu_statistics,
};
<snip>
henrikbrixandersen commented 8 months ago

Apart from the Intel SEU IP, do we have other examples of similar devices for which support could be implemented using the same API?

nbalabak commented 8 months ago

Apart from the Intel SEU IP, do we have other examples of similar devices for which support could be implemented using the same API?

Hello @henrikbrixandersen , This driver API commom we can use who want to handle single event upsets monitor in system.

henrikbrixandersen commented 8 months ago

This driver API commom we can use who want to handle single event upsets monitor in system.

Yes, but do you have an example of a similar device from a different vendor?

nbalabak commented 8 months ago

This driver API commom we can use who want to handle single event upsets monitor in system.

Yes, but do you have an example of a similar device from a different vendor?

@henrikbrixandersen nope i dont have any other example

yashi commented 8 months ago

Xilinx (now AMD) has Soft Error Mitigation (SEM) Core.

nbalabak commented 8 months ago

Xilinx (now AMD) has Soft Error Mitigation (SEM) Core.

@yashi Thank you for sharing information.