Open avolkov-1221 opened 1 month ago
I understand where you're coming from with this, but the ideas put forward above contradicts the whole idea of a vendor using a HAL with Zephyr; Vendors use HALs with Zephyr in order to facilitate code reuse between Zephyr and other supported ecosystems (baremetal, other RTOS) which reduces both development and testing time.
Forcing HALs to use Zephyr primitives would cause a dependency loop and void the idea of vendor code reuse between Zephyr and other ecosystems.
Forcing HALs to use Zephyr primitives would cause a dependency loop and void the idea of vendor code reuse between Zephyr and other ecosystems.
I don't really agree with this: most vendors already have some compatibility layers in one form or another (in fact, Renesas has too), but in the case of Zephyr, not all of them are actually trying to use these layers. Certainly, hardware-specific things are outside the scope of this RFC. However, irq_lock()
/irq_unlock()
for ARM-based SOCs, on the contrary, are directly relevant.
Vendor HALs are often by design OS agnostic as mentioned by @henrikbrixandersen, Vendors have a lot to gain in the lanes of better performance, code size optimization, code reuse, simpler maintenance, CI infrastructure etc. when seen from the perspective of zephyr by mutating their HALs to better match zephyr.
I don't think we should attempt to enforce changes in vendors' downstream HALs to better match zephyr, vendors who wish to support zephyr better will typically opt to adapt their HALs and uses of them to better match zephyr for reasons of the gains mentioned above. No enforcement needed :)
Problem description
Hello,
I would like to escalate and discuss a problem that arose during the recent ill-fated discussion of the new HAL from Renesas. Specifically, the lack of any requirements for HALs provided by hardware vendors in the form of modules.
Despite the declared goal of avoiding, among other things, the reinvention of the wheel when introducing the modules, we now have a whole set of wheels in different incompatible shapes from various vendors. Or, to put it more seriously, there are some issues that could impact both the current system's stability and performance, and significantly complicate future tree-wide or subsystem-wide changes:
Almost every HAL developer reinvents synchronization primitives. Particularly, interrupts enable/disable functions are being reinvented instead of using the Zephyr's provided irq_lock()/irq_unlock() or spinlocks.
Work queues, semaphores/mutexes and fine grained interrupts locks are rarely used instead of simple global interrupts enable/disable (where possible, of course). However, excessive use of interrupt locks can, in turn, lead to "interesting" consequences, such as hard-to-detect priority inversions. Btw how many vendors actually perform Real Time-related checks?
Each HAL developer uses their own formatting rules, usually undocumented and almost always entirely incompatible with Zephyr's guidelines. As a result, HAL's bug fixes by 3rd party developers (if they will accepted) lead to code that is difficult to read, review and unlikely to pass any automated testing.
Almost each HAL developer uses (if) their own proprietary automated test suites, which are not related to, and unlikely compatible with Twister. As a result, automated test coverage for the customer's system as a whole is likely impossible, making it difficult to evaluate the system's reliability.
Almost each HAL developers reinvent error codes that are incompatible with the error codes used by Zephyr, and the subsequent unnecessary and pointless remapping of errors can significantly impact performance, or mask the real reason of a problem.
...
As a consequence of the above, it is often easier to create a local fork of some parts of the original HAL and bring them into some compatible form with Zephyr (and with Twister) than to reuse it as is. This, in turn, increases the number of new wheel shapes ad infinitum. So we got a situation that is directly opposite to the declared target.
Proposed change
To normalize the current situation to at least a minimally acceptable level, I would like to define some basic requirements for modules supplied by vendors, based on this discussion. We could start with the following:
Vendor-specific synchronization primitives should not be used instead of Zephyr's, unless required by specific hardware characteristics. Also the vendor's code should avoid using any locks whenever possible, leaving synchronization management to the calling side.
Using homebrewed functions for reading from or writing to peripheral registers should not be allowed unless strictly necessary. Zephyr's sys_XXXX functions should be used instead.
Error codes must be identical to Zephyr's ones. In most cases, this can be achieved by simply adding a new header file with code redefinitions in the form of
#define VENDOR_ERR_CODE_NAME ZEPHYR_ERR_CODE
.Code style rules should be documented if they differ from Zephyr's guidelines. Or, better, Zephyr's ones should be used instead.