zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.93k stars 6.65k forks source link

core os: migrate away from posix types (ssize_t, off_t) below the posix layer #77856

Open cfriedt opened 2 months ago

cfriedt commented 2 months ago

Introduction

This RFC addresses usage of POSIX types within Zephyr. We outline when POSIX types were introduced originally, problems that have arisen as a result, as well as possible courses of action.

This RFC was created to provide some rational to PR #75348, which received 7 approvals within just a few days of being posted.

Small Historical Digression

The ssize_t type limits the size of individual I/O transactions while also being able to signal an error condition (via -1). It must at least be able to provide values in the range [-1, 2^15 - 1] (or [-1, 32767] (See limits.h). In Zephyr, ssize_t is also be used to signal errors via negative errno return values. A 32-bit value here is fine, since individual I/O transaction sizes rarely exceed 2^31. There is typically no added cost to using 64-bits for this type on 64-bit systems, so ssize_t typically matches the word size of the machine.

Meanwhile, the off_t type is used to describe extent sizes but also absolute and relative positions within extents. Extent sizes as well as absolute or relative positions within extents should normally be at least as large as that of an I/O transaction. However, if extent sizes are only as large as an I/O transaction on 32-bit systems, the extent sizes can be considered severely limited (we would not be able to describe SD cards greater than 2 GiB). One might naively assume that only storage devices of a few hundred kiB or MiB might be connected to a particular system, but when one considers network resources, and in particular, networked filesystems, or for example a real-time device that processes video, then it begins to make significantly more sense to use e.g. a 64-bit type for off_t, even on 32-bit systems.

POSIX formalized these particular scenarios in unistd.h with

32-bit systems: _POSIX_V7_ILP32_OFF32 (32-bit int, long, off_t, pointer) is conservative _POSIX_V7_ILP32_OFFBIG (32-bit int, long, pointer. off_t >= 64-bit) is future-proof

64-bit systems: _POSIX_V7_LP64_OFF64 (32-bit int. 64-bit long, off_t, pointer)

Why do these options exist? Why do we not always simply use OFFBIG or OFF64? Because 64-bit return values, 64-bit arithmetic, etc, can be costly on 32-bit systems that simply do not require it. This is a concern that we need to be cognizant of for the Zephyr community.

Bug History

10436 6 years old, closed as stale (off_t and ssize_t used as far back as 8 years ago)

eb0aaca (bits copied from d2258b0 despite DNM label)

Problem description

POSIX sits above the operating system and calls-in to kernel APIs, such as multi-threading and semaphores, and OS Service APIs, such as file system and networking, as shown in the diagram below.

Screenshot 2024-09-01 at 9 24 02 AM

However, there have historically been dependency cycles in the File System and Networking APIs (potentially others). A dependency cycle can be seen as a lower-layer depending on an upper layer, which of course depends again on the lower layer (a cycle).

A significant amount of work has been done already to eliminate the dependency cycles from Zephyr’s Networking subsystem, and in fact, the few remaining dependency cycles are now deprecated, slated for removal in the 4.1 release (#77069).

On the other hand, Zephyr’s base os (including lib/os/fdtable.c) pulls in two types that are specific to POSIX; ssize_t and off_t. Neither of these types are part of ISO C or native to Zephyr. This results in a dependency cycle in the Base OS and File System areas.

The problems that arise are:

Proposed change

The change being proposed is to switch to standard ISO C types (and possibly Zephyr types) below the POSIX line, since there is no need to depend on POSIX at all.

Specifically, this RFC proposes Option 4 outlined below.

[!NOTE]
Zephyr’s native architecture is considered a special case.

Detailed RFC

Proposed change (Detailed)

The preferred solution is detailed in Option 4 (below). In general, changes must be bisectable. So, start with the least invasive change (e.g. adding new types and Kconfigs), then using unions to provide alternative options for e.g. function pointers, then converting function pointers, one area at a time, and finally removing the unnecessary POSIX types and inclusions.

Option 1: Do nothing / cherry-pick or override POSIX types

Pros:

Cons:

Option 2: Use ISO C types only (int64_t)

Pros:

Cons:

Option 3: Use ISO C types only (long)

Pros:

Cons:

Option 4: Use ISO C + Zephyr types

Pros:

Cons: A compromise on 32-bit-only systems resulting in one of the tradeoffs below. No negative consequences on 64-bit systems.

The <sys/types.h> header shall be removed from Zephyr and Core Services.

Dependencies

Along with entries under include/, samples/, and tests/, changes are required in each of the following areas (see e.g. #75348):

Concerns and Unresolved Questions

It would be very interesting to calculate the monthly compound interest rate (or even annualized interest rate) of technical debt between 2016 (or 2018) and today, measured in terms of lines of code.

Will add as needed.

Alternatives

Option 1 is the main alternative at this time although it is non-constructive.

Open to suggestions.

henrikbrixandersen commented 2 months ago

POSIX sits above the operating system and calls-in to kernel APIs, such as multi-threading and semaphores, and OS Service APIs, such as file system and networking, as shown in the diagram below.

Where does that diagram originate from? I am not sure I agree that "POSIX" can be considered as just one layer in this regard.

cfriedt commented 2 months ago

Where does that diagram originate from?

@henrikbrixandersen - this particular diagram was added in 72f52c9e44ef4a56c1f0558b985bc567de9c60fa.

I am not sure I agree that "POSIX" can be considered as just one layer in this regard.

Would you care to elaborate?

cfriedt commented 2 months ago

Two weeks have expired without any kind of elaboration. I'm not sure if there needs to be a debate on the viewpoint above, since it's somewhat tangential.

The main purpose of this RFC is to resolve the dependency cycles introduced by the ssize_t and off_t types, choosing a suitable option out of the proposed options, and implementing the necessary changes in a controlled way.

henrikbrixandersen commented 2 months ago

Two weeks have expired without any kind of elaboration. I'm not sure if there needs to be a debate on the viewpoint above, since it's somewhat tangential.

Well, this is not the first time, we are discussing this. My comments given in https://github.com/zephyrproject-rtos/zephyr/issues/75513 still stand.

nashif commented 2 months ago

Where does that diagram originate from? I am not sure I agree that "POSIX" can be considered as just one layer in this regard.

that is probably originally from me. POSIX in this diagram is the posix portability layer implementation, i.e. the type definition used prior to introducing the POSIX compatibility layer are not part of this box.

cfriedt commented 2 months ago

Well, this is not the first time, we are discussing this.

It certainly isn't.

My comments given in https://github.com/zephyrproject-rtos/zephyr/issues/75513 still stand.

Well, they are standing on a slippery slope.

So far, what is propping them up are defense of two anti-patterns, and pointing out how one implementation blurred the lines as a workaround, so we should do the same workaround. It unfortunately does not seem like a convincing argument for Option 1.