zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.93k stars 6.65k forks source link

drivers: dma: dma_nxp_edma: edma_start() is not `isr-ok` on imx8 platforms #80573

Open LaurentiuM1234 opened 3 weeks ago

LaurentiuM1234 commented 3 weeks ago

Describe the bug

https://github.com/zephyrproject-rtos/zephyr/commit/48b98a9284c3d1fe341c4bb150920b82f5b5b397 moved the irq_enable() call from edma_config() to edma_start(). Among other things, one of the issues with this is the fact that irq_enable() is not ok to use inside ISRs because it might involve communication with a FW to enable the PD of the interrupt controller. This contradicts the isr-ok mark with which dma_start() is tagged.

Expected behavior edma_start() should be safe to use inside ISRs as the DMA API promises.

Additional context On the imx8 platforms (i.e: imx8 and imx8x) irq_enable() and irq_disable() also perform PM operations (since the interrupt controller has a PD attached to it). As such, to make sure that the interrupt controller's PD is disabled when not needed, the number of irq_enable() and irq_disable() calls needs to be equal.

This leads to the following question: how can we do this with the DMA API w/o breaking the isr-ok promise of the functions?

Currently, I only see two options:

  1. Move irq_enable() to edma_config() and disallow the CONFIGURED->CONFIGURED channel state transition (i.e: when a second dma_config() is performed on the same channel, edma_config() will return 0 and do nothing). irq_disable() shall stay inside dma_stop(). I'm not sure how well this will play out because that means each sequence started with dma_config() has to be ended with dma_stop() so that the PDs are properly managed.
  2. Add some channel request()/release() functions to the DMA API that are not marked with the isr-ok tag in which we're free to perform these operations and other PM-related operations such as pm_device_runtime_get() and pm_device_runtime_put(). This is somewhat problematic because we already have dma_request_channel() and dma_release_channel(), which are tagged with isr-ok.
LaurentiuM1234 commented 3 weeks ago

@teburd can you please comment on this? Thanks!

teburd commented 3 weeks ago

dma_start() is asynchronous by its nature, if the driver cannot do the logic in an ISR then you could offload this to the work queue would be a typical method.

dbaluta commented 3 weeks ago

@teburd but we also want to make sure that the power domain is up before we actually start the DMA HW.

I would be very in favour of having an option to enable the power domain when we request a dma channel.

Cc: @lgirdwood

teburd commented 3 weeks ago

@teburd but we also want to make sure that the power domain is up before we actually start the DMA HW.

Right which could be done in a work queue task that dma_start() triggers. This doesn't break any contracts, but it does add latency to dma_start().

I would be very in favour of having an option to enable the power domain when we request a dma channel.

We have channel get/release style APIs for DMA already, power could be enabled/disabled on channel ownership and release reasonably well I suppose, but often times I've seen things done where the channel is effectively statically defined which wouldn't help here.

Is it untenable to suggest power management be done by the caller in this case? E.g. pm get/put on the dma device by the code/peripheral using DMA at the appropriate time? It knows best after all.

Cc: @lgirdwood

LaurentiuM1234 commented 3 weeks ago

dma_start() is asynchronous by its nature, if the driver cannot do the logic in an ISR then you could offload this to the work queue would be a typical method.

Hm, this might be interesting to try out. One major concern I have is how well will this play out in the context of SOF? After a dma_start() we usually do a TRIGGER_START on the DAI so what I think could end up happening is we'd get an underrun if the work item is not executed in time (not sure I remember correctly, but weren't the triggers executed by the timer domain thread which has the highest COOP priority? CC: @kv2019i @lgirdwood)

I think the deferred work approach needlessly complicates things. Ideally we should have a channel init/setup and a deinit phase in which we're allowed to perform this kind of stuff. We do have dma_request_channel() and dma_release_channel() for this but they're both tagged with isr-ok. We could sort out the issue with dma_release_channel() quite easily as we can use pm_device_runtime_put_async() instead of its sync version (also do this in irq_disable()) but this is not the case for dma_request_channel() (pm_device_runtime_get() is not suitable for usage in ISRs in the case of imx8/imx8x).

Is it untenable to suggest power management be done by the caller in this case? E.g. pm get/put on the dma device by the code/peripheral using DMA at the appropriate time? It knows best after all.

For imx8/imx8x the EDMA has channel-based PDs. If we choose to handle this as part of the DMA device's PM callback we'll lose the granularity (i.e: we'll be forced to enable all channel PDs even if some are unused). I don't think exposing this to the application will play out well especially when there's multiple vendors involved with their own DMAs (e.g: SOF).