zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.42k stars 6.38k forks source link

drivers: Supporting driver-specific extensions to existing generic APIs #11993

Closed pabigot closed 5 years ago

pabigot commented 5 years ago

I frequently run into cases where I want a specific driver to support capabilities not accessible through the standard driver API. Some of these might be added to the generic API, but some are too specific. Examples include:

Candidate approaches include:

This issue is open to solicit perspectives on how this should be supported.

pabigot commented 5 years ago

See some discussion in slack.

pabigot commented 5 years ago

Here's an example of the approach I've taken for a complete rework of CCS811:

A new header ccs811.h is added to include/drivers/sensor. It includes documented data types for things like firmware version description, baseline calibration values, current environmental state. It also prototypes functions to manipulate those, e.g.:

int ccs811_configver_fetch(struct device *dev,
                           struct ccs811_configver_type *ptr);
int ccs811_baseline_fetch(struct device *dev);
int ccs811_baseline_update(struct device *dev, u16_t baseline);
int ccs811_envdata_update(struct device *dev,
                          const struct sensor_value *temperature,
                          const struct sensor_value *humidity);

User code would look like:

#include <drivers/sensor/ccs811.h>
// ... 
rc = ccs811_configver_fetch(dev, &cfgver);
if (0 == rc) {
  printf("CCS811 HW %02x; FW Boot %04x App %04x ; mode %02x\n",
         cfgver.hw_version, cfgver.fw_boot_version,
         cfgver.fw_app_version, cfgver.mode);
}

Because these functions are provided by the implementation of the ccs811 driver they have no impact on any other driver.

Because they're prototyped the arguments can be at least type-checked at build time, and clearly documented so their behavior is clear.

pabigot commented 5 years ago

And here's a rough outline of what it might take to use an IOCTL approach:

We'd need changes like this in include/device.h:

We'd still need the driver-specific headers to define the data structures, build up the IOCTL request constants, and document everything:

#define IOCTL_CCS811_GET_BASELINE _IOR('S', 0xCC01, u16_t)
#define IOCTL_CCS811_SET_ENVDATA _IOW('S', 0xCC02, struct ccs811_envdata)

The user code would end up being:

rc = device_ioctl(ccs811, IOCTL_CCS811_GET_BASELINE, &baseline);
rc = device_ioctl(ccs811, IOCTL_CCS811_SET_ENVDATA, &envdata);

In short, it could probably be done, but it'd be pretty hard to get right if we don't replicate the POSIX/Linux ioctl architecture, a whole lot of effort if we do, and pretty ugly to anybody who isn't an experienced POSIX developer.

andyross commented 5 years ago

This seems like significantly wasted work to me. The Linux model is "syscalls are rare and carefully curated and drivers need to follow these rules to poke holes in the abstraction for their own APIs".

The Zephyr model, so far, has been "drivers are responsible for exposing their own APIs via simple function calls, and for the special case of userspace we work to make it easy to expose those calls as automatically-generated system call traps".

Given the latter, all the ioctl() rigor doesn't seem to provide much value. Maybe the latter isn't the right model, I can see an argument there. But if we want to go there let's open up the "Zephyr's driver interface abstractions are a leaky mess" flame war and not start with ioctl. :)

pfalcon commented 5 years ago

@pabigot: Thanks for submitting this as a ticket. Let me admit then when I saw a post on Slack, I mostly saw keywords like "ioctl" vs "multiple functions". My immediate comment was that choosing to add multiple "virtual methods" to drivers vs single "ioctl-like vmethod", I'd recommend the latter.

If the feature you care about is clearly specific to a particular hardware device, and this its driver, then your <drivers/sensor/ccs811.h> interface is absolutely right way to do. Its target audience it: people who work with a particular sensor, and who want to get all out of it (at the expense of making the application specific to a particular sensor, requiring substantial efforts to port to another sensor).

pfalcon commented 5 years ago

Now let me elaborate on what I had in mind when responding on Slake, for completeness.

We start not with a particular driver, we start with a subsystem, let it be sensor subsystem here too. Suppose, for every sensor we want to expose a means to get its accuracy. (I don't know how contrived that example is to you, for me, it's not at all.) So, how do we implement that? What we have is struct sensor_driver_api: https://github.com/zephyrproject-rtos/zephyr/blob/master/include/sensor.h#L260 , which is a struct of function pointers aka vmethods. The whole "sensor API" is nothing but a set of dispatcher functions which call a vmethod for a particular device.

So, should we add something like sensor_get_accuracy_t get_accuracy;? That's the whole point of my argument - no, let's not do that/stop doing that, because it doesn't scale. Instead, let's add:

int (*sensor_ioctl)(int cmd, void *arg1, void *arg2);

there. And let's define

enum sensor_op {
sensor_get_accuracy,
...
};

I hope I clarified myself and it was somewhat related, and that I didn't cause too much confusion (again).

Comparing to your, @pabigot, example, key point is that we define subsys-specific "ioctl-like". By all means, let's avoid defining ioctl's on the level of struct device, that's far too polymorphic.

EDIT: And of course, subsys-level ioctl's make sense only if some feature really exists/is reusable across drivers, and if a common API can be reasonably devised for it. In all other cases, it's better to go for driver-adhoc functions like you show.

pabigot commented 5 years ago

@pfalcon yes, that's exactly it. While I appreciate in principle the goal of Zephyr to make applications as generic as possible by providing a common API, in the domain I'm working I know exactly what hardware and sensors I'm going to be using. It's...sub-optimal...for me to be dealing with temperatures and humidities expressed as 64-bit fixed-point values when the driver collects them together at a resolution of no more than 16 bits each. (Regarding sensor accuracy: I appreciate your point, but don't think that's going to scale--the current sensor_channel enumeration doesn't. I also don't think this is the place to discuss it in depth.)

@andyross I take from your comment you're not in favor of ioctl, which is my position as well. It's not clear to me whether you support (in principle) the driver-specific header and function approach, or believe driver-specific capabilities should be exposed through some other extension of a generic sensor, GPIO, or ADC API.

This arises because outside of clock control I can't see any API in include that isn't genericized to the point where features that distinguish a specific product can't be accessed. There are no driver ad-hoc functions for any sensor device, which astonishes me. I'd like to expose them in a way that's acceptable to the Zephyr community and am looking for architectural guidance.

andyross commented 5 years ago

I think my position summarizes as "I'd like to see some attention paid to firming up exactly how drivers expose APIs , but I don't think it should look like ioctl"

andrewboie commented 5 years ago

One thing about system calls. Right now all the driver system calls take a generic struct device pointer. The syscall code is able to distinguish what subsystem any given struct device belongs to; we have some build system logic which notes what API struct was assigned to the device at build time (i2c_driver_api, uart_driver_api, etc) and sets this in the kernel object metadata (K_OBJ_DRIVER_I2C, K_OBJ_DRIVER_UART, etc)

This allows us to enforce that appropriate calls are being made on a device at the subsystem level; the system will reject a system call made on a UART API if the device object isn't associated with the uart subsystem. This works because of the common interface between all devices of a particular subsystem type.

However, we don't have any facilities for distinguishing between individual driver instances at the moment. So if we were to do system calls that were truly driver specific, outside of the subsystem layer, and the only pointer passed into the API is of a generic struct device, we can say this driver belongs to the appropriate subsystem but currently no more than that, which would be problematic; for an API that only works on specific driver X of subsystem Y, we can only (curently) say that the device pointer belongs to subsystem Y and not that it is also an instance of driver X, it could also be some driver Z which also implements subsystem Y.

Now if we did the abstraction at the subsystem level, I think we can handle this fairly easily. So like for the sensor API, add an additional ioctl method (or whatever you like to call it). If a particular driver doesn't have any capabilities beyond what is offered in the base API, this API pointer can be set to NULL. Each driver that did offer extra capabilities would have to do checks to ensure that the requested ioctl operation is supported, and such ioctl operation types would be shared across all drivers of that subsystem.

Note that I have no particular attachment to any of the ideas being proposed in this thread, just thought I'd mention this caveat. Not saying it's impossible to support out-of-subsystem calls either, we'd just have to solve the problem of correctly enforcing that appropriate device pointers are being passed in (but it may be a lot of hassle to do so.)

pabigot commented 5 years ago

@andrewboie Thanks, this exposes more detail I was missing related to verification and userspace, the design criteria that led to the current architecture, and the breadth of Zephyr's "here, let me do that for you" code generation framework. I think the user-level API I've suggested is still a good solution, but underneath it might have to be accomplished using a generic extension function that supports this checking. I have some ideas and will try them when it comes time to submit some of these enhancements as PRs.

andrewboie commented 5 years ago

we'd just have to solve the problem of correctly enforcing that appropriate device pointers are being passed in (but it may be a lot of hassle to do so.)

Thinking about this some more, it might not be terrible to also uniquely identify drivers. It may not be much hassle to determine that given a generic device pointer, to associate with a specific driver. Right now the build system logic stores the type of the device->driver_api object in the kernel object metadata, but we could additionally register the device->driver_data which is specific to the driver implementation. So we don't necessary need to rule this out. It might actually be better this way.

it might have to be accomplished using a generic extension function that supports this checking

The problem I see with a subsystem-level extension function, that takes as a parameter some enumerated type of capabilities declared subsystem-wide in some switch statement, is that I'm not seeing much difference in that, and just adding new API pointers for these capabilities to the subsystem, which are set to NULL for devices that do not support them.

I have some ideas and will try them when it comes time to submit some of these enhancements as PRs.

I'd suggest sending some high-level proposals here before you write a lot of code and send a PR. The discussion may need to iterate a few times before everyone is happy.

andrewboie commented 5 years ago

This commit here is a quick-and-dirty proof-of-concept on doing a syscall specific to a driver:

https://github.com/andrewboie/zephyr/commit/3a6fbf9a545704c9caeea06bdd9d26f6a6fe576a

Output of samples/hello_world on qemu_x86:

Hello World! qemu_x86
Hello from user mode!
Hello from device 0x0040c1c0, _impl_uart_ns16550_specific_syscall(42)

This is very hacked together and a real implementation would not register the dev_data structs as kernel objects, but you get the idea.

If the feature you care about is clearly specific to a particular hardware device, and this its driver, then your <drivers/sensor/ccs811.h> interface is absolutely right way to do. Its target audience it: people who work with a particular sensor, and who want to get all out of it (at the expense of making the application specific to a particular sensor, requiring substantial efforts to port to another sensor).

I concur, if it's really specific to the HW then just declare a syscall in the driver specific header, and the plumbing would look similar to the POC above.

If we have a capability which is supported by many, but not all devices, then I think we should just add it to the API and set the API pointer to NULL for devices that don't support it.

I think my position summarizes as "I'd like to see some attention paid to firming up exactly how drivers expose APIs , but I don't think it should look like ioctl"`

@andyross fair enough, but what should it look like?

pfalcon commented 5 years ago

@andrewboie

The problem I see with a subsystem-level extension function, that takes as a parameter some enumerated type of capabilities declared subsystem-wide in some switch statement, is that I'm not seeing much difference in that, and just adding new API pointers for these capabilities to the subsystem, which are set to NULL for devices that do not support them.

Well, I do see. In a general case, there can be a hundred such "additional functions", which of course unlikely would be implemented by all drivers, wasting memory (even if RO) on all these sparse pointer arrays. Next - it's highly dependent on the actual functions of course - but usually it's possible to amortize some operations of similar functions by combining them into one. That would apply to syscall validation functions - even if there's a only dozen of them, each validating device pointer, then merging them into one, validating that pointer once, even dispatching on "command code" should lead to the overall code savings. And of course, I assume each new syscall is not free, and we should target having less syscalls overall, rather than more.

So, as was pretty clearly noted https://github.com/zephyrproject-rtos/zephyr/issues/11917, usage of syscalls is not mandatory. But even folks who aren't interested in userspace/syscalls should be aware that they're integral part of Zephyr, and should try to "design for syscalls", even if the actual implementation won't be done by them, but by someone else later.

pabigot commented 5 years ago

The problem I see with a subsystem-level extension function, that takes as a parameter some enumerated type of capabilities declared subsystem-wide in some switch statement, is that I'm not seeing much difference in that, and just adding new API pointers for these capabilities to the subsystem, which are set to NULL for devices that do not support them.

I've italicized the phrase that I think is key. The functions I'm interested in exposing are absolutely driver-specific. They do not belong at the subsystem, and having to manage their individual identifiers globally rather than within the driver is not ideal. My original thought was that they could be handled with an ioctl-like function, exposed through the subsystem but multiplexing the various required functions through identifiers managed solely within the driver header and implementation.

I don't know whether this is possible, but highlighting the role of syscalls enlightened me and it does affect the design of any solution that will provide driver-specific extensions. I also think they must be designed into any initial solution. Again I have ideas and will play with them as I get time; with travel and holidays that won't be for a week or two.

andyross commented 5 years ago

(Typed a long reply into the wrong spot. Here it is again) I just don't see the value in all the ioctl "rigor". All ioctl does is give you a way to pass a number and a (user) pointer into a driver-supplied callback in the kernel. The numbers (in theory, but NOT PRACTICE) all go into a registry somewhere so they're known to be unique and validatable by the kernel in an automated way. And the pointer allows you to pass all sorts of crazy data structures for the kernel to unpack using a special cross-address-space API (THIS IS NOT A GOOD THING).

Right now, our existing syscall hashing layer gets you that unique number registry for free, without trouble. And our automatic passing of function arguments does 60-80% of what most calls actually need in practice without forcing the handler to read user memory. And we don't need to have a special namespace/registry of these things becuase their names are just C symbols and the linker does that checking for us. Basically: our stuff is better than ioctl already in most ways.

Really, the only feature I see from above that seems to be requested that we don't have is that it would be good to have an automatic checking layer so that function calls can authenticate driver support for a particular call. So if you have two UART devices and only one supports uart_set_stop_bits(), it would be nice for the kernel to fail it automatically, and (maybe) answer the question "does this API work on this device?".

andrewboie commented 5 years ago

Right now, our existing syscall hashing layer gets you that unique number registry for free, without trouble. And our automatic passing of function arguments does 60-80% of what most calls actually need in practice without forcing the handler to read user memory. And we don't need to have a special namespace/registry of these things becuase their names are just C symbols and the linker does that checking for us. Basically: our stuff is better than ioctl already in most ways.

I agree. I don't currently see what we gain by assembling a set of ioctls at the subsystem level instead of just C functions which are selectively implemented. User mode completely aside, you either have to maintain a set of C functions, or cases in some large switch construction. Once user mode gets into the picture, you still need to do proper validation (so the work involved for writing handler functions should be about the same) plus there's an extra copy involved since the args will be passed in via struct and not directly over registers.

Really, the only feature I see from above that seems to be requested that we don't have is that it would be good to have an automatic checking layer so that function calls can authenticate driver support for a particular call. So if you have two UART devices and only one supports uart_set_stop_bits(), it would be nice for the kernel to fail it automatically, and (maybe) answer the question "does this API work on this device?".

I think the current model for this is to have NULL pointers in the API struct if something is not implemented. Some subsystems do a check in the inline implementation function if the API is implemented and return an error, some you will simply crash. In the handlers we build on top of the implementations, we always do check if the API pointer is populated (this is what macros like Z_SYSCALL_DRIVER_UART(<dev>, <api name>) do) but the penalty for failure is severe in that the calling thread is killed. If I think I understand you correctly, we should overhaul this to always return errors? This is agreeable to me. We may want to have a policy for "required" vs "optional" APIs.

andyross commented 5 years ago

I think the current model for this is to have NULL pointers in the API struct

Right, but that's for well-defined APIs with tight sets of entry points. I think we do that just fine already (architecturally, obviously we lack actual drivers and frameworks for lots of things). But ioctl has historically been used as a way for the driver, absent any kernel or framework blessing, to implement a way for userspace to talk directly to it.

And the way it works in a linux driver is that you have a big switch statement on the ioctl command word, with a default case that does "return -EINVAL;", and everything just works. Userspace that tickles the wrong command gets an obvious error back. In our model we rely on the drivers to individually validate their arguments, which has been made robust for the obvious cases (e.g. this argument must be a k_fifo) but doesn't extend in an automated way to more subtle checking (this must be a uart device handle created by this particular driver).

pabigot commented 5 years ago

@andrewboie @andyross I'm not yet up to speed to entirely follow what you're discussing, so to step back:

To use the CCS811 Indoor Air Quality sensor correctly the application must be able to perform these functions:

// Retrieve the current sensor-generated calibration value
int ccs811_baseline_fetch(struct device *dev);
// Restore a sensor-generated calibration value
int ccs811_baseline_update(struct device *dev, u16_t baseline);
// Inform the sensor of current environmental parameters
int ccs811_envdata_update(struct device *dev,
                          const struct sensor_value *temperature,
                          const struct sensor_value *humidity);

Baseline calibration is collected at various points in the sensor lifecycle and retained to be restored on reset so it doesn't take up to 24 hours to start getting accurate results again. Environmental data has to be updated synchronously with each sample so the calculations performed by the sensor are adjusted for temperature and humidity.

There is no API in the sensor subsystem for these operations.

How should they be implemented in a userspace situation?

andrewboie commented 5 years ago

How should they be implemented in a userspace situation?

If we agree that these are truly APIs specific to the CCS811, I think you should define an include/drivers/sensor/ccs811.h (if it doesn't already exist) and define these APIs there. And put the function implementations in the ccs811 driver itself.

You can just define them as regular C functions. For the system calls, file an enhancement ticket on github and I can when I have some time re-work the POC I posted earlier, update the documentation on how to add out-of-subsystem syscalls, and add the system calls for these APIs, they look like they will be pretty simple to add.

pabigot commented 5 years ago

Referencing #12056 which was partly inspired by this.

pabigot commented 5 years ago

@andrewboie Thanks for #12056 which I believe was intended to support this need. I've finally gotten back to this, and I'm not clear how you intended the infrastructure to be used.

Here's concrete example: I'm doing a driver for the Maxim DS3231 RTC/TCXO, a real-time clock with battery backup that maintains civil time to 1 s precision. Under Zephyr's architecture this is fundamentally a 1 Hz counter, and it's implemented using that API, so one passes the device to counter API functions.

To set the time and to configure other important features I need two additional API calls, declared in a header under include/devices and exposed using an API structure that extends the counter API structure with the device-specific functions. I've made the stub implementation available in my example/11993 branch. The relevant code is here.

The approach of casting the device to the extended structure can't be right because nothing here is going to validate that the dev passed to rtc_ds3231_configure() came from the ds3231 driver. I understand that's what Z_SYSCALL_SPECIFIC_DRIVER was supposed to check, but have no idea where to put that check or how to make it be generated by infrastructure somewhere.

I can see that this would work if the two driver-specific API calls were added to the generic counter API structure, but that's a non-starter: doing that would waste space for every driver, and it's fundamentally unworkable for out-of-tree drivers.

Could you elaborate on how you expected this solution to be used? Or, if the problem is made more clear by this example do you have suggestions on how to make this work?

b0661 commented 5 years ago

The UART driver provides a driver specific system call since the very beginning of Zephyr.

It is simple. It is already available. It is guarded by a config option to prevent waste of memory. It works (tested with a specific UART implementation where mostly configuration work but also functionality is routed through this interface).

Can this be the model for the other drivers (maybe by exchanging 'u32_t p' by 'uint_ptr p') too?

#ifdef CONFIG_UART_DRV_CMD

/**
 * @brief Send extra command to driver.
 *
 * Implementation and accepted commands are driver specific.
 * Refer to the drivers for more information.
 *
 * @param dev UART device structure.
 * @param cmd Command to driver.
 * @param p Parameter to the command.
 *
 * @retval 0 If successful.
 * @retval failed Otherwise.
 */
__syscall int uart_drv_cmd(struct device *dev, u32_t cmd, u32_t p);

static inline int z_impl_uart_drv_cmd(struct device *dev, u32_t cmd, u32_t p)
{
    const struct uart_driver_api *api =
        (const struct uart_driver_api *)dev->driver_api;

    if (api->drv_cmd) {
        return api->drv_cmd(dev, cmd, p);
    }

    return -ENOTSUP;
}

#endif /* CONFIG_UART_DRV_CMD */
pabigot commented 5 years ago

@b0661 Thanks, I hadn't seen that solution mentioned before. I suppose it could work with wrappers to make the api a lot more user friendly. The signatures of the different extended functions will generally be very different.

There's nothing that prevents you from passing arguments designed for a different driver variant, resulting in unpredictable behavior. I think that's why @andrewboie provided the driver-specific syscall API, but I still need to understand how that was intended to be used.

pabigot commented 5 years ago

Approach described in #17072 has been approved.