Closed swift-tk closed 3 months ago
Hi @swift-tk! We appreciate you submitting your first issue for our open-source project. 🌟
Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙
Arch WG:
mtd
subsystem for this. Also agrees with the idea of having a new APImpsi_nor.c
in drivers/flash
, removing the current mess of qspi-specific flash driverI could not attend the meeting (it's really not the right time for me), but has anyone noted that this is again a proposal for dual/quad/octal modes in addition to other things? So already, is there somebody really interested to get such modes as a generic SPI feature exposed? (See PR #39991). We could avoid having a new API altogether.
@tbursztyka - I also like the idea of supporting dummy clocks before a data phase. Specifically, because some peripherals (e.g. fpgas) use it for handshaking, and I think that is missing from the current api
@tbursztyka I’d like to emphasize that the proposed MSPI API would drive a different hardware peripheral from the SPI. In terms of functionality, this is very much like I3C over I2C : The MSPI could do everything that SPI does. I would suggest to not expanding the SPI and keep its focus on serial(one lane) and low speed peripheral support. Then introduce a new API that is multi-lane and focus on supporting advanced memory devices.
As far as I can tell, it is hard to overhaul the SPI API and that is why everybody just implement their own HAL abstraction controller and have their own device drivers call the controller. This I mean the qspi, ospi from st, qspi from Nordic, flexspi from nxp. I’ve looked into each of these controller implementations, and I think that there are a lot of things in common and that is what I did in my new API proposal.
I also would like to point like that there are multiple other things missing from the SPI API. The SPI lacks ways to configure the hardware when not sending a transfer. This is useful to configure the hardware before entering XIP mode. The advanced SPI devices demand splitting transfer into command, addr and data phases and often time dummy cycles between addr phase and data phase. This is not needed for other generic SPIs.
There is no multiple device support and callback management as the generic SPI would not need such things.
There can be multiple memory devices on the same hardware instance. Some is supported natively through hardware, and for some that does not, it can be supported through software.
Since the hardware is more sophisticated than common SPI, it could raise different interrupts even when we are not explicitly calling for a transfer. E.g. XIP events.
I forgot to mention that the SPI does not support DQS configuration which is required for high speed devices.
- I also like the idea of supporting dummy clocks before a data phase. Specifically, because some peripherals (e.g. fpgas) use it for handshaking, and I think that is missing from the current api
@cfriedt Isn't this just a certain amount of 0s "being sent" at certain places or is there more to this? Because sending/receiving dummy bytes in generic SPI is something straightforward via the buffer sets. But, I get that as a generic feature, we may need something on a slightly higher level that would allow configuring such dummy clocks (via DTS etc..) . Well see below how I see things
@swift-tk Thanks for the details. When it comes to dedicated qspi devices (such as nordic's etc..), this was the main reason why at the time we did not integrate dual/quad/octal into generic SPI API because these controllers offers hardware optimizations the generic API would not be able to take full advantage of, like XIP or JEDEC etc... (btw, have a quick look at #20564, as you can see it's not a recent issue).
Do I understand your proposal well so that you would like to replace all dedicated qspi, ospi, ... device drivers, found in flash currently, into ones implementing MSPI API? And then there would (ideally) be a unique flash_mspi.c driver of some sort ?
My own concern has always been that one: are there some use cases where a generic SPI controller would be driving single mode devices and dual/quad/octal based devices (most likely a flash memory device)? Does that type of controller even exist ? So far I never could get my hand on such controller. I know that Design Ware has such hardware (though besides the specs I could not get any). Are there more out there? (I guess there are or such issue #51402 would not have popped up)
Anyway, I see things that way at the moment, sticking with the flash use case: (I hope my crappy asci art shows up the same way everywhere)
FLASH device driver
(flash_mspi.c)
/\
\/
MSPI API _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/ \ \
/ \ \
dedicated qspi dedicated ospi Generic SPI API
controller controller |
generic SPI
controller
Btw, doing that you could then - and only then as there is no other way round (unless MSPI API would have to be exposed by ALL existing SPI drivers...) - remove spi_nor.c also. Since below the MSPI API, it would use the generic SPI API when relevant (through a dedicated driver, mspi_generic_spi.c for instance).
I hope I am clear enough.
@tbursztyka
@cfriedt Isn't this just a certain amount of 0s "being sent" at certain places or is there more to this? Because sending/receiving dummy bytes in generic SPI is something straightforward via the buffer sets. But, I get that as a generic feature, we may need something on a slightly higher level that would allow configuring such dummy clocks (via DTS etc..) . Well see below how I see things
Certainly you could construct the buffer such that the dummy cycles are between addr and data. However, there are two problems I can see, one being that it is ambiguous. Thus in the controller layer, there is no way to know whether the 0s are actually data or dummies(as well as cmd and addr length), unless the hardware is configured to be used as a generic SPI, in which case, it does not care. This leads to the second problem being not efficient enough for advanced SPI hardware(qspi and etc) because often times the transfers are initiated as many times as the number of buffers. I believe this is a part of what you mentioned about hardware optimization? (as the advanced hardware could just initiate one transfer for everything)
Do I understand your proposal well so that you would like to replace all dedicated qspi, ospi, ... device drivers, found in flash currently, into ones implementing MSPI API? And then there would (ideally) be a unique flash_mspi.c driver of some sort ?
Yes, your understanding is correct, the device drivers(flash_mspi_device_name.c) that uses the proposed MSPI API could eventually replace all platform specific device drivers. But controller layer would have to be rewritten to implement the proposed API.
My own concern has always been that one: are there some use cases where a generic SPI controller would be driving single mode devices and dual/quad/octal based devices (most likely a flash memory device)? Does that type of controller even exist ? So far I never could get my hand on such controller. I know that Design Ware has such hardware (though besides the specs I could not get any). Are there more out there? (I guess there are or such issue https://github.com/zephyrproject-rtos/zephyr/issues/51402 would not have popped up)
I haven't seen anything exactly that in Zephyr repo, but I do know that SPI API is used to initialize the flash device in the espi example. However, I don't know enough about Intel eSPI to tell what is going on next. This https://github.com/zephyrproject-rtos/zephyr/issues/51402 talked about a device right? not a controller? There are actually many flash memory devices that require a serial(one lane) start up initialization process, some even have to be switched back to serial mode to send commands. These memory devices are under JESD251C profile 1.0. My example implementation here https://github.com/AmbiqMicro/ambiqzephyr/blob/RFC-MSPI/drivers/flash/flash_mspi_atxp032.c shows exaclty that kind of flow. In any case, a device may still requires cmd, addr, dummy and data phase with different length when using different commands even in serial mode. So using the generic SPI would still be a pain in the ass.
If there were such a controller hardware, the proposed API could still cover that by setting mspi_dev_cfg/mspi_xfer_packet.ui16InstrLength/
, mspi_dev_cfg/mspi_xfer_packet.ui16AddrLength
and dummies to zero to indicate a generic SPI operation. If so desired to maintain backward compability, the controller driver could do the translation and call the old spi controller driver.(applies to what you drew on the far right). If not, it could just call the platform HAL(applies to what you drew on the left).
With that being said, I think calling the SPI API from MSPI API seems to be an overkill. It would only spend more processing time and not nearly efficient.
But I could definitely see that the SPI and MSPI would co-exist for quite a while. And then Zephyr may slowly deprecate the SPI API.
One of the issues with e.g. the lattice ice40 series was the need to invert /CS polarity mid-sequence as a kind of handshake.
It's a definite quirk. Currently it's bit-banged with gpio because the old Zephyr SPI API does not have a way to make that happen synchronously within timing requirements. It could maybe be done as a bit of a board hack / platform driver, but doing it with the SPI API would be better.
A few comments here...
I'm pushing to move i2c/spi to use a generalized command queue + completion queue API (rtio) like io_uring or iocp as it...
I'm wondering if this would best be served by using a common API for all of this as its already being used in an updated sensors API, already has tests verifying transactional behavior, chaining, and cancellation.
Cheers!
@teburd See this paragraph.
The controller can determine which user callback to trigger based on enum mspi_bus_event_cb_mask upon completion of each async/sync transfer if the callback had been registered using mspi_register_callback. Or not to trigger any callback at all with MSPI_BUS_NO_CB even if the callbacks are already registered. In which case that a controller supports hardware command queue, user could take the full advantage of it in terms of performance if scatter IO and callback management are supported.
The mspi_register_callback just registers the callback and callback context. The controller driver should save it in accordance with the event type. I have xfer complete event type defined and the controller may use that to notify async completion. The demo usage of the async call can be found here https://github.com/AmbiqMicro/ambiqzephyr/blob/RFC-MSPI/samples/drivers/memc/src/main.c
I did not use a software queue in aync implementation as our controller has hardware command queue.
Do you have code I could look at? It could certainly be a supplement for those that don’t have hardware command queue. I think it may be better to take similar forms as spi_context.h so that the interface API doesn’t force people to use a software queue when not needed.
Let me reiterate that having callbacks be the primary event notifier here will not work with user space and register_callback can't be a syscall, I don't think that's being fully appreciated or understood.
This makes me believe the register_callback is stateful and connected to the next transfer?
mspi_register_callback(controller, dev_id, MSPI_BUS_XFER_COMPLETE, (mspi_callback_handler_t)async_cb, &cb_ctx1);
mspi_transceive_async(controller, dev_id, &pack1);
mspi_register_callback(controller, dev_id, MSPI_BUS_XFER_COMPLETE, (mspi_callback_handler_t)async_cb, &cb_ctx2);
mspi_transceive_async(controller, dev_id, &pack2);
Is that right? That's a bit suspicious, is it expected the user of an mspi device maintain an external lock on it?
And you'd like those two async transfers to occur back to back presumably, without something else taking control of the bus controller?
How rtio looks using spi might be...
struct rtio_sqe *sqe;
struct rtio_cqe *cqe;
struct rtio_iodev *spi_device = /* MACRO here... */
sqe = rtio_sqe_acquire(ctx);
rtio_prep_write(sqe, spi_device, buf, buf_len);
sqe->flags |= RTIO_TRANSACTION;
sqe = rtio_sqe_acquire(ctx);
rtio_prep_write(ctx, spi_device, buf, buf_len);
rtio_submit(ctx, 2); /* submit to the spi controller a write and read, block waiting on 2 completions, if 0 is given the requests are started but the call should not block */
cqe = rtio_consume_cqe(ctx); /* could block here instead if you want */
/* cqe->result has a int result code */
if (cqe->result != 0) {
LOG_ERR("write+read to spi device failed");
}
Ideally every rtio sqe would turn into a hardware command/dma transer. For lpspi this isn't quite true as the mcux sdk doesn't provide direct low level access to the hardware command queue today.
The iodev can contain configuration specifics, in this case its the chip select, struct device *, clock configuration, and polarity settings typical of spi.
One of the issues with e.g. the lattice ice40 series was the need to invert /CS polarity mid-sequence as a kind of handshake.
It's a definite quirk. Currently it's bit-banged with gpio because the old Zephyr SPI API does not have a way to make that happen synchronously within timing requirements. It could maybe be done as a bit of a board hack / platform driver, but doing it with the SPI API would be better.
That would definitely be an improvement for the SPI API, go ahead make a dedicated issue, I'll look into this.
@swift-tk
Certainly you could construct the buffer such that the dummy cycles are between addr and data. However, there are two problems I can see, one being that it is ambiguous. Thus in the controller layer, there is no way to know whether the 0s are actually data or dummies(as well as cmd and addr length), unless the hardware is configured to be used as a generic SPI, in which case, it does not care. This leads to the second problem being not efficient enough for advanced SPI hardware(qspi and etc) because often times the transfers are initiated as many times as the number of buffers. I believe this is a part of what you mentioned about hardware optimization? (as the advanced hardware could just initiate one transfer for everything)
I was bringing this up only in the case were a generic SPI controller would be in use (so no hardware features exposed to simplify that). Obviously this does not fit with dedicated controllers. Thus why I agreed on a configuration exposed via DTS (not for SPI API, so yes MSPI would do). Then, below, it's just a matter of a driver using it relevantly: a dedicated controller would use the hw optimizations, a generic SPI controller would mimic by inserting the right length buffer of 0 at proper places.
Do I understand your proposal well so that you would like to replace all dedicated qspi, ospi, ... device drivers, found in flash currently, into ones implementing MSPI API? And then there would (ideally) be a unique flash_mspi.c driver of some sort ?
Yes, your understanding is correct, the device drivers(flash_mspi_device_name.c) that uses the proposed MSPI API could eventually replace all platform specific device drivers. But controller layer would have to be rewritten to implement the proposed API.
Ok but why saying that then:
But I could definitely see that the SPI and MSPI would co-exist for quite a while. And then Zephyr may slowly deprecate the SPI API.
So far, MSPI seems fully designed over a very minimal subset of use cases which is related to d/q/o, xip and jedec targeting flash memories. Bringing a lot of features and configuration bits that would not fit 99% of other use cases (thus a lot of overhead, useless rom/ram usage - seriously the proposed structures are super heavy - etc...).
It's confusing. What's the actual focus here? Finally fixing d/q/r, xip, jedec support in SPI or replacing SPI API?
I did not reply on that, but looking at previous statement, now is the time:
@nashif suggests introducing this new driver and slowly replacing the spi one with it
That would mean implementing the proposed API in all existing SPI drivers. (knowing that the proposed API brings nothing to single line SPI mode - the 99.999% usage atm). Is that what's being wanted ?
That would mean implementing the proposed API in all existing SPI drivers. (knowing that the proposed API brings nothing to single line SPI mode - the 99.999% usage atm). Is that what's being wanted ?
Not sure this is something desirable. To me these both API address different use cases and controllers. We should keep them exclusive.
@teburd I get your point, the syscall will be removed from mspi_register_callback
.
This makes me believe the register_callback is stateful and connected to the next transfer?
Not necessarily. The example is a bit bias by Ambiq HAL. The user could just register the callback and the context once if there is no need to change it throughout the program, but I guess that is generally not the case. So I shown changing the context per async call.
BTW, one async call don't necessarily mean to prepare the hardware to start only one transfer. There are use cases where each transfer requires a callback. Thats why I have eCBMask
inside of struct mspi_buf. e.g. Partial refreshing of SPI display. The callback is used to release the lock to the part of memory that had been transferred(flushed) and enables refilling with new data, so that optimial performance can be achieved.
Is that right? That's a bit suspicious, is it expected the user of an mspi device maintain an external lock on it?
Whether it needs an external lock is entirely up to the controller implementation. In my case (Ambiq MSPI), it is not need as there are two internal locks. One for transfer context, one for controller access.
Our HAL is setup to record the callback and context per transfer and call them after transfer completes. So just for my case, I would just need a mspi_transceive_signal_setup
to call the mspi_register_callback
for asynchronous notification.
And you'd like those two async transfers to occur back to back presumably, without something else taking control of the bus controller?
It would occur back to back as it writes to hardware command queue and does not wait for the previous to complete. The controller driver would have to be written in a way that blocks others (other device / next transfer) from accessing the controller. See here example implementation of the controller driver. https://github.com/AmbiqMicro/ambiqzephyr/blob/RFC-MSPI/drivers/mspi/mspi_ambiq_ap3.c
I like the RTIO idea, it can certainly be supplemented to MSPI API the way you did for SPI. But the mspi_register_callback
is not used solely for the async call as I indicated here:
mspi_register_callback * @param controller Pointer to the device structure for the driver instance. * @param dev_id Pointer to the device ID structure from a device. * @param evt_type The event type associated the callback. * @param cb Pointer to the user implemented callback function. * @param ctx Pointer to the user callback context.
This routine provides a generic interface to register different types of bus events. The dev_id is provided so that the controller can identify its device and determine whether the access is allowed in a multiple device scheme. The enum mspi_bus_event is a preliminary list of bus events. There are XIP events that can be added. I encourage the community to come up with more events that they would use.
I'd like to keep this RFC as simple as possible as the primiary focus is to provide a generic API for advanced SPI devices that the generic SPI fails to achieve.
@tbursztyka
But I could definitely see that the SPI and MSPI would co-exist for quite a while. And then Zephyr may slowly deprecate the SPI API.
So far, MSPI seems fully designed over a very minimal subset of use cases which is related to d/q/o, xip and jedec targeting flash memories. Bringing a lot of features and configuration bits that would not fit 99% of other use cases (thus a lot of overhead, useless rom/ram usage - seriously the proposed structures are super heavy - etc...).
It's confusing. What's the actual focus here? Finally fixing d/q/r, xip, jedec support in SPI or replacing SPI API?
I would say neither, the primary focus is to provide a generic API for advanced SPI devices that the generic SPI fails to achieve.
My opinion is that the generic SPI could no longer serve for interfacing with advanced SPI devices in terms of overall performance. By the way, the advanced SPI usage is not that small and believe me, I did not just come up with this without actually trying to drive those advanced SPI devices with the generic SPI. In the cases of transfering cmd or small data size, the prepartion time is even longer than the actually transfer time when using the generic SPI.
I just said the proposed MSPI API is capable of doing what SPI does, whether it can replace SPI is entirely a different matter. Like you said, it has more configuration that generic SPI devices don't need, then let people use the generic SPI for simple SPI devices. But leave the complex and heavy lifting interfacing to a dedicited API. There is no reason to force people to use the generic SPI for complex devices.
Also, the industry have evolved, some common SPI controllers in SoCs are upgraded to have cmd and data phases. This means for RX operation, it no longer needs to start a TX first and then switch to RX because it can be done through hardware. But the generic SPI treats everything as data which is not ideal. You would also need software CE/CS control for the generic SPI while the upgraded controllers does not.
I don't see a way around having more complex data structures as the SPI controller and devices grows more complex. Personally, one of the problems I find with SPI API is that it is not really expandable.
@erwango I agree with you, we don't need to implement the MSPI API for all legacy SPI device drivers. @tbursztyka Thats why I said MSPI and SPI could co-exist. If all your devices are generic SPI, then there is no reason to switch to MSPI. However, the state of drivers under flash, memc and possibly other subsystems calls for a generic API as everybody is not using generic SPI but their own HAL and those device drivers are all platform specific.
@tbursztyka Sorry I had to keep editing my responses.
Certainly you could construct the buffer such that the dummy cycles are between addr and data. However, there are two problems I can see, one being that it is ambiguous. Thus in the controller layer, there is no way to know whether the 0s are actually data or dummies(as well as cmd and addr length), unless the hardware is configured to be used as a generic SPI, in which case, it does not care. This leads to the second problem being not efficient enough for advanced SPI hardware(qspi and etc) because often times the transfers are initiated as many times as the number of buffers. I believe this is a part of what you mentioned about hardware optimization? (as the advanced hardware could just initiate one transfer for everything)
I was bringing this up only in the case were a generic SPI controller would be in use (so no hardware features exposed to simplify that). Obviously this does not fit with dedicated controllers. Thus why I agreed on a configuration exposed via DTS (not for SPI API, so yes MSPI would do). Then, below, it's just a matter of a driver using it relevantly: a dedicated controller would use the hw optimizations, a generic SPI controller would mimic by inserting the right length buffer of 0 at proper places.
Just having the configurations through DTS is just not enough. Thoses settings may change in run-time depending on the command, mode of operation(mspi_io_mode
/data rate), and the frequency the device is operating on.
@teburd I get your point, the syscall will be removed from
mspi_register_callback
.This makes me believe the register_callback is stateful and connected to the next transfer?
Not necessarily. The example is a bit bias by Ambiq HAL. The user could just register the callback and the context once if there is no need to change it throughout the program, but I guess that is generally not the case. So I shown changing the context per async call.
BTW, one async call don't necessarily mean to prepare the hardware to start only one transfer. There are use cases where each transfer requires a callback. Thats why I have
eCBMask
inside of struct mspi_buf. e.g. Partial refreshing of SPI display. The callback is used to release the lock to the part of memory that had been transferred(flushed) and enables refilling with new data, so that optimial performance can be achieved.Is that right? That's a bit suspicious, is it expected the user of an mspi device maintain an external lock on it?
Whether it needs an external lock is entirely up to the controller implementation. In my case (Ambiq MSPI), it is not need as there are two internal locks. One for transfer context, one for controller access. Our HAL is setup to record the callback and context per transfer and call them after transfer completes. So just for my case, I would just need a
mspi_transceive_signal_setup
to call themspi_register_callback
for asynchronous notification.And you'd like those two async transfers to occur back to back presumably, without something else taking control of the bus controller?
It would occur back to back as it writes to hardware command queue and does not wait for the previous to complete. The controller driver would have to be written in a way that blocks others (other device / next transfer) from accessing the controller. See here example implementation of the controller driver. https://github.com/AmbiqMicro/ambiqzephyr/blob/RFC-MSPI/drivers/mspi/mspi_ambiq_ap3.c
How is exclusive usage of the device done then between multiple calling contexts?
E.g. multiple threads running concurrently or preempting eachother, or perhaps even in an ISR wanting to start an async transfer that is caught later. What happens if register_callback called from context A while an async request is on-going started by context B which had previously registered a call back?
E.g. the flow goes...
thread A: register_callback thread A: transceive_async thread B: register_callback ISR: callback called for thread A's async transceive completion, thread B believes transceive done?
It's not very clear to me at least how all this is dealt with. Its even less clear how this might work out where you could start an async transfer from an ISR which turns out to be a really useful thing for things like sensors at least.
@teburd
How is exclusive usage of the device done then between multiple calling contexts?
Once again, this is entirely up to the controller driver implementation on how to tackle threading. What I took is a similar approach that is in spi_context.h
E.g. multiple threads running concurrently or preempting eachother, or perhaps even in an ISR wanting to start an async transfer that is caught later. What happens if register_callback called from context A while an async request is on-going started by context B which had previously registered a call back?
In this situation for my case, I have a local struct mspi_context
that contains copy of the async request from B. The copy including callback and callback context, which are brought in before the lock happens and saved to struct mspi_context
after B obtaining the lock.
This is very similar to that of spi_context_lock. The lock is not going to get released until all transfers gets queue in the hardware queue. So the register_callback from A would not affect what is in struct mspi_context
. Because of the fact that our HAL also records the callback and callback context along with that transfer, the callback and callback context would not get tampered for B.
If in the case which the hardware command queue does not take callback and callback context or no hardware command queue at all, implementing RTIO software queue can be helpful.
E.g. the flow goes...
thread A: register_callback thread A: transceive_async thread B: register_callback ISR: callback called for thread A's async transcieve completion, thread B believes transcieve done?
No, register_callback from B may have changed the local storage but not going to affect A because callback and callback context are in mspi_context
Now if we are only looking at the controller driver implementation, there is a chance for the callback and callback context local storage to get tampered. Execution flow below: thread A: register_callback thread B: register_callback thread A: transceive_async
However, this would not happen if the device driver is implemented correctly as there is always a third lock. Say for a read_async function in the device driver, I would expect the following.
acquire(device);
mspi_register_callback();
mspi_transceive_async();
release(device);
So thread B is not going get the device control and be able to call mspi_register_callback before thread A releases the device control.
@swift-tk
the primary focus is to provide a generic API for advanced SPI devices that the generic SPI fails to achieve.
What are the "advanced SPI devices" out there? Besides memory devices I mean. Are there any other types?
This was the reason why, in 2019, it was decided not to adopt d/q/o and other features into SPI because it was much simple to just interface the flash API (or other memory APIs) directly. This is an important question because, if there are no other devices types (sensors or else), then what will be the benefit to add overhead via an in-between API such as the proposed one, versus what we have now in flash controllers for instance?
I disagree on "generic SPI fails to achieve" statement: as long as these advanced features are not supported into the API, obviously it will fail to use them.
Then comes another question: what blocks the possibility to improve the existing SPI API to enable those features versus creating an entirely new API?
I did not reply on that, but looking at previous statement, now is the time:
@nashif suggests introducing this new driver and slowly replacing the spi one with it
That would mean implementing the proposed API in all existing SPI drivers. (knowing that the proposed API brings nothing to single line SPI mode - the 99.999% usage atm). Is that what's being wanted ?
That was not my point at all. My point here is not to get into the discussion of how we can support this with current SPI API and end up getting nowhere. We had this before with many APIs and trying to over-complicate existing APIs to support new hardware and devices should not be the first option we think about. what I am saying here is, introduce mspi as a new API, if there is a path to add compatibility to existing spi, it should be done, if it is good enough to cover existing drivers we will only then consider folding everything under one api, but lets not spend too much time trying to make this work now impacting current (stable) API and the drivers implementing it.
@tbursztyka
What are the "advanced SPI devices" out there? Besides memory devices I mean. Are there any other types?
For instance, displays and sensors that are driven by multi-lane SPI. In fact, SPI devices that operates with high speed(>= 50MHz CLK) more or less shares the same characteristics. i.e. multi-lane, variable latency, DQS, io mode switching and etc.
This was the reason why, in 2019, it was decided not to adopt d/q/o and other features into SPI because it was much simple to just interface the flash API (or other memory APIs) directly. This is an important question because, if there are no other devices types (sensors or else), then what will be the benefit to add overhead via an in-between API such as the proposed one, versus what we have now in flash controllers for instance?
True, that may be the case 5 years ago. A lot of things have happened in 5 years in electronic business. Now categories of devices have grown really big in wearable, smartphone and IoT.
What Zephyr have now under the flash controller aside from SoC internal NV flash, I believe, is chaos. Each SoC vendors have to create their own controller abstraction with no standard whatsoever as SPI can not support those controllers, which nowadays should be common enough to form a standardlized API. I would argue that with any API, there would be some overhead. And I believe it is better to have an API for adhering to the idea that allows for easy switching between SoC platforms.
Then comes another question: what blocks the possibility to improve the existing SPI API to enable those features versus creating an entirely new API?
Well, it is like you said, there would be more overhead for those generic SPI devices that don't support thoes features. And the existing SPI API and MSPI drives different peripheral hardwares. I also think that it is impossible to not break the current SPI to achieve similar functions that the proposed MSPI does. Even if we somehow managed to maintain backward compability with SPI, there is probably going to be more overhead than bringing in a new API.
If I have to say, I particularly dislike the way that SPI passes spi_config
when initiating a transfer. In my view, most of them are more device/hardware settings instead of transfer related things.
I think that, in order to counter the emergence of this new API, which clearly has uses, someone must propose an alternative where the current SPI API is extended to cover the use cases that this one does. @tbursztyka is this something you could take a look at?
@tbursztyka some of your questions relate to discussions that took place in the Arch WG meeting, including having a single mspi_nor.c
in drivers/flash
or the types of hardware peripherals that would be implemented by this new API.
It's not a problem that you cannot attend, but could you take a look at the recording? There was input from ST and NXP as well that positively acknowledge the differences in hardware between standard SPI and "advanced SPI" peripherals.
Here is the recording: https://zoom.us/rec/share/KHaMiI1O7r7vmQdSvXp-LnnSRIed2ikNeiUE7zIVUiRRgdJ-fGjbJUAKxq8awxGZ.p8AMB9sstvctuYJB
lets not spend too much time trying to make this work now impacting current (stable) API and the drivers implementing it.
@nashif Got it, the report was confusing. Just went through the link @carlescufi has provided, it's much clearer.
@swift-tk
True, that may be the case 5 years ago. A lot of things have happened in 5 years in electronic business. Now categories of devices have grown really big in wearable, smartphone and IoT.
Ok, that's the answer I was looking for. Because as soon as there is more than just memory devices out there using these features, then yes it makes sense to make a common API for it despite the added overhead. (I note however, that during the arch presentation, only memory devices were discussed)
Then comes another question: what blocks the possibility to improve the existing SPI API to enable those features versus creating an entirely new API?
Well, it is like you said, there would be more overhead for those generic SPI devices that don't support thoes features. And the existing SPI API and MSPI drives different peripheral hardwares. I also think that it is impossible to not break the current SPI to achieve similar functions that the proposed MSPI does. Even if we somehow managed to maintain backward compability with SPI, there is probably going to be more overhead than bringing in a new API.
I tend to agree with @nashif as well here. Expanding existing SPI API is possible (actually my old proposal is quite obsolete anyway), but at a cost of more caution needed to not break anything etc... And indeed, in the end, perhaps we could reach a point - if relevant - to merge both into a unique one someday.
That said when you say
no support for multiple devices on the same controller instance.
That's completely not true. Of course existing API can handle multiple devices on the same controller instance. Have you spend enough time understanding the existing API ?
If I have to say, I particularly dislike the way that SPI passes
spi_config
when initiating a transfer. In my view, most of them are more device/hardware settings instead of transfer related things.
That was done for a reason: as soon as you have 2+ peripherals connected to the same SPI controller which by definition have no idea about each other, how does a peripheral A - when initiating a transfer - knows that the controller is properly configured for itself and not for a peripheral B (and vice-versa)?
Under the hood: the controller only has to re-configure if the given configuration pointer (these are meant to be constant) changes. It's fully transparent for the user in the end, and has no need to worry about configuration status. That basically enables multi devices managed by one controller driver btw.
It was this solution, or allocating a fixed array of spi_config pointers into the controller driver and setting them on relevant configure() calls. It was easier to do the above solution (DTS integration was very much at its beginning, there was almost nothing to know how much devs are hooked etc..). At the bottom of this, it anyway ends up being the same for the controller: it will have to configure/reconfigure itself if relevant prior to any transfer anyway.
@tbursztyka
That was done for a reason: as soon as you have 2+ peripherals connected to the same SPI controller which by definition have no idea about each other, how does a peripheral A - when initiating a transfer - knows that the controller is properly configured for itself and not for a peripheral B (and vice-versa)?
Under the hood: the controller only has to re-configure if the given configuration pointer (these are meant to be constant) changes. It's fully transparent for the user in the end, and has no need to worry about configuration status. That basically enables multi devices managed by one controller driver btw.
It was this solution, or allocating a fixed array of spi_config pointers into the controller driver and setting them on relevant configure() calls. It was easier to do the above solution (DTS integration was very much at its beginning, there was almost nothing to know how much devs are hooked etc..). At the bottom of this, it anyway ends up being the same for the controller: it will have to configure/reconfigure itself if relevant prior to any transfer anyway.
Understood, I clearly missed that as it is so ambiguous and I didn't find anyone using SPI for multiple devices. I'd like to point out that it is still a bad solution conventionally as an API should not be doing two different things all together. i.e. Switching between slave and starting a transfer. Guess it was just done easy than done right.
Anyway, I'm glad that we finally reached some consensus.
@tbursztyka @carlescufi May I ask what are the next steps for this RFC? Am I approved to implement or it need further discussion?
Architecture WG:
@tbursztyka @carlescufi May I ask what are the next steps for this RFC? Am I approved to implement or it need further discussion?
The PR should include:
* No objections at the Arch WG, missing an explicit ACK from @tbursztyka who is the maintainer
It's a new API, so nobody's ack is necessary here afaik.
* No objections at the Arch WG, missing an explicit ACK from @tbursztyka who is the maintainer
It's a new API, so nobody's ack is necessary here afaik.
By "ACK" I meant whether you are OK with adding a new API instead of extending the current one.
By "ACK" I meant whether you are OK with adding a new API instead of extending the current one.
Yes as described by @nashif, it's better to have something out of the existing SPI API.
By "ACK" I meant whether you are OK with adding a new API instead of extending the current one.
Yes as described by @nashif, it's better to have something out of the existing SPI API.
Excellent, thanks @tbursztyka. @swift-tk feel free to proceed with the RFC.
the original context in which i was thinking about this issue (see #51402) was related to conceptually using a bt81x connected to a Nordic nRF52840
using QSPI on the bt81x requires instructing the bt81x to switch from 1x mode to x2 or x4 mode by writing to a "REG_SPI_WIDTH" register which defaults to x1. so you have to write in x1 mode to then switch to x2/x4 mode
to me a new but functionally compatible MSPI Bus is fine, as long as you can identify what bus is attached to switch driver mode like do with dual SPI/I2C drivers
in relation to qspi and multiple devices as far as i am aware multiple devices on SPI is relatively standard, and i can see this sometimes being potentially necessary for QSPI as well
e.g. the nordic nRF52840 and a bt81x and e.g matter usage
when the internal flash storage on the nRF52840 is insufficient, external flash is required, and you want a qspi display in combination with this you need to connect both to the same QSPI Bus controller.
the nRF52840 QSPI module only has one CS so you cant connect multiple simultaneously internally
It looks like you can reconfigure the pin multiplexing to connect both (deactivate module. change pin, reactivate module) however the controller driver has to both know it would have to do this and access to the information necessary.
secondly the bus controller driver would have to know which mode each device is currently operating in as talking 4x to a 1x device and vice versa is unlikely to work and devices may have individual mode switching requirements.
im not sure what/how of the other SPI "modes" extend to QSPI
@firebladed
when the internal flash storage on the nRF52840 is insufficient, external flash is required, and you want a qspi display in combination with this you need to connect both to the same QSPI Bus controller.
the nRF52840 QSPI module only has one CS so you cant connect multiple simultaneously internally
It looks like you can reconfigure the pin multiplexing to connect both (deactivate module. change pin, reactivate module) however the controller driver has to both know it would have to do this and access to the information necessary.
secondly the bus controller driver would have to know which mode each device is currently operating in as talking 4x to a 1x device and vice versa is unlikely to work and devices may have individual mode switching requirements.
In your case, the device switching will have to be managed by software. struct mspi_dev_id
is for this purpose. The controller would then know which device it should talk to and conduct switching when necessary. i.e. changing pins by pinctrl. And there should be no need to deactivate and reactivate the device as long as the controller implementation handles context switching and threading.
The controller driver does not need to know the device operating mode as long as the device drivers record their current operating mode and simply reconfigure the controller during device switching. But for the convience of debug, the controller driver may record the device specific settings (struct mspi_dev_cfg
) for the current device that occupies the bus.
@swift-tk I think a mspi_is_xip
would be required to be able to query xip state. For instance if xip is set by a bootloader, then in the application image, you need to know xip status for various reaons.
Hi @erwango
That can surely be integrated to mspi_get_channel_status
. We could have some enum for the returned status.
If the intent is for this API to be used by Zephyr Sensors moving forward than it likely needs to implement the rtio API. Are there sensors in mind that will use this?
I have created a draft PR here https://github.com/zephyrproject-rtos/zephyr/pull/71769 for the RFC. I invite anyone who is interested to take a first glance.
@carlescufi Does it need to be discussed again in AG meetings? Since I had to split into more than one PRs for these items.
The PR should include:
Header file At least one implementation of the new bus API (i.e. the hardware peripheral) A device driver that uses the new bus API (e.g. a flash memory) A test that runs under an emulator Documentation (see https://docs.zephyrproject.org/latest/hardware/peripherals/index.html)
Hi @erwango That can surely be integrated to
mspi_get_channel_status
. We could have some enum for the returned status.
Tbh, I'm not clear on the concept of channel on this API. What is a channel in the context of this API and how a client would know on which channel it should query status ? Anyway, let me check the drat PR :)
Hi @erwango That can surely be integrated to
mspi_get_channel_status
. We could have some enum for the returned status.Tbh, I'm not clear on the concept of channel on this API. What is a channel in the context of this API and how a client would know on which channel it should query status ? Anyway, let me check the drat PR :)
That just a place holder for now, I saw some vendors used it. From what I understand, it is the controller instance number(for the actual hardware).
If the intent is for this API to be used by Zephyr Sensors moving forward than it likely needs to implement the rtio API. Are there sensors in mind that will use this?
From me, for the moment, I don't have any sensors in mind. Maybe @RichardSWheatley has some plans.
If the intent is for this API to be used by Zephyr Sensors moving forward than it likely needs to implement the rtio API. Are there sensors in mind that will use this?
From me, for the moment, I don't have any sensors in mind. Maybe @RichardSWheatley has some plans.
No plans currently for me either.
If the intent is for this API to be used by Zephyr Sensors moving forward than it likely needs to implement the rtio API. Are there sensors in mind that will use this?
On STM32 at least, targetted controllers are not intended to be used with sensors but are focused on external memories.
Perfect, I wanted to ensure we didn't do lots of great new work only to turn around and note that it doesn't fit with the newer sensor API flows we are pushing forward
@teburd I've been looking at RTIO and rtio_sqeprep* function usages, I have a few questions. The 'rtio_sqe_prep_callback' seems to be always the last element of the entries in the software queue that makes up "a transfer". The user callback is called when the top of the queue reaches that element with OP_CALLBACK and all other entries get passed to the controller driver. This seems to imply that whatever transfer started before user callback being called had completed (by observing the wave) by the time the user callback is called.
If my understanding is correct, then it is not truly async and would not be compatible with a controller implementation that uses hardware command queue for async transfer calls but does not wait for the transfer to complete or handles the complete signaling.
To be compatible with what I'm proposing here, the rtio_executor_op
would need to wait for a complete signal from a lower level callback registered using mspi_register_callback
. Luckily, mspi_register_callback
would only need to be called one time during initialization. Even with this, the advantage of the hardware queue has not been utilized fully as it would stop pushing to the hardware queue between two transfer calls(wait for the first call to complete and then push the second call). Unless the user callback is handled in a standalone queue(maybe in completion queue).
close as https://github.com/zephyrproject-rtos/zephyr/pull/71769 is merged.
Introduction
Rather than attempting to overhaul the existing SPI interface, adding a new MSPI API may be a better option for both new and existing users.
Problem description
The existing Zephyr SPI has many limitations including but not limited to
Proposed change
The MSPI interface should contain a controller driver that is SoC platform specific and implements the following APIs. The device driver should then reference these APIs so that a unified device driver can be achieved.
Note: mspi_register_callback to removed from the list of syscalls.
Detailed RFC
Methodology
To better serve the modern-day memory devices, I have divided configurations into three categories:
API Detail
Now let's dive deeper and checkout what each API should do.
This routine provides a generic interface to override MSPI controller capabilities. In the controller driver, one may implement this API to initialize or re-initialize their controller hardware. Additional SoC platform specific settings that are not in struct mspi_cfg may be added to one's own binding(xxx,mspi-controller.yaml) so that one may derive the settings from DTS and configure it in this API. In general, these settings should not change during run-time.
This routine provides a generic interface to override MSPI controller device specific settings. With struct mspi_dev_id defined as the device index and CE GPIO from device tree, the API supports multiple devices on the same controller instance. It is up to the controller driver implementation whether to support device switching either by software or by hardware. The implementation may also support individual parameter configurations specified by enum mspi_dev_cfg_mask. The settings within struct mspi_dev_cfg don't typically change once the mode of operation is determined after the device initialization.
An example of the DTS.
This routine provides a generic interface to check whether the hardware is busy. This is useful in the multiple slave devices scheme.
This routine provides a generic interface to register different types of bus events. The dev_id is provided so that the controller can identify its device and determine whether the access is allowed in a multiple device scheme. The enum mspi_bus_event is a preliminary list of bus events. There are XIP events that can be added. I encourage the community to come up with more events that they would use.
This routine provides a generic interface to transfer a request synchronously/asynchronously. The dev_id is provided so that the controller can identify its device and determine whether the access is allowed in a multiple device scheme. The req is of type mspi_xfer_packet which allows for dynamically changing the transfer related settings once the mode of operation is determined and configured by mspi_dev_config. The API supports bulk transfers with different starting addresses and sizes with struct mspi_buf. However, it is up to the controller implementation whether to support scatter IO and callback management. The controller can determine which user callback to trigger based on enum mspi_bus_event_cb_mask upon completion of each async/sync transfer if the callback had been registered using mspi_register_callback. Or not to trigger any callback at all with MSPI_BUS_NO_CB even if the callbacks are already registered. In which case that a controller supports hardware command queue, user could take the full advantage of it in terms of performance if scatter IO and callback management are supported.
This routine provides a generic interface to configure the XIP and scrambling feature. Typically, the cfg parameter includes an enable and the range of address to take effect. I also wouldn't expect these settings to change often.
This routine provides a generic interface to configure timing parameters that are SoC platform specific. If it is used, there should be one's own definition for param_mask and cfg type in one's own *.h file.
Dependencies
To compile, one needs to checkout “apollo3p-dev-mspi” at https://github.com/AmbiqMicro/ambiqhal_ambiq.git and "RFC-MSPI" at https://github.com/AmbiqMicro/ambiqzephyr.git
The branch is based of a PR that has yet to be merged to Zephyr main. https://github.com/zephyrproject-rtos/zephyr/pull/67815. Please look at these commits for the example implementations.
The API prototype is at https://github.com/AmbiqMicro/ambiqzephyr/blob/RFC-MSPI/include/zephyr/drivers/mspi.h
Example code
Example implementation of the MSPI API can be found in this path zephyr\drivers\mspi\mspi_ambiq_ap3.c Example usage of the MSPI API can be found in the following files.
zephyr\drivers\memc\memc_mspi_aps6404l.c PSRAM APS6404L device driver zephyr\drivers\flash\flash_mspi_atxp032.c NOR FLASH ATXP032 device driver
zephyr\samples\drivers\memc\src\main.c demo the usage of mspi_transceive_async