Open henrikbrixandersen opened 5 years ago
@MaureenHelm is this issue being looked at?
@MaureenHelm is this issue being looked at?
Not yet, but going to try to look at it tomorrow.
This is caused by the underlying MCUX SDK LPSPI driver missing a feature to hold the chip select active after a transfer. The underlying MCUX SDK DSPI driver used on frdm_k64f has a kDSPI_MasterActiveAfterTransfer
flag that we need in the LPSPI driver used on KE1xF and i.MX RT10xx. I will file an internal NXP ticket to add this feature.
Decreasing the priority of this bug from medium to low since we can work around it with GPIO CS.
This is still an issue with the most recent MCUX SDK (I was just bitten by this issue on bringing up a newly developed board today). Should we change all affected in-tree boards to use cs-gpios
until this is resolved?
@henrikbrixandersen i've encountered similar issues with SPI on LPC55xxx, see my 2 pulls:
Re-opening after discord discussion, as this issue is still relevant.
If CONFIG_SPI_MCUX_FLEXCOMM_DMA=y
, this issue also appears with the spi_mcux_flexcomm
driver.
(If CONFIG_SPI_MCUX_FLEXCOMM_DMA
is not enabled, then spi_mcux_flexcomm
has correct CS behavior regardless of whether GPIO or hardware CS is used.)
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
I produced this code to try to reproduce the issue on mimxrt1024_evk on commit b573f447f04f042f427fafd23fb8cd4cacfe6e30 based on what was originally posted by @henrikbrixandersen :
#include <zephyr/kernel.h>
#include <stdio.h>
#include <zephyr/drivers/spi.h>
#include <zephyr/drivers/gpio.h>
#include <zephyr/sys/util.h>
struct spi_config spi_cfg = SPI_CONFIG_DT(DT_NODELABEL(dummy_spi_dev),
(SPI_OP_MODE_MASTER | SPI_TRANSFER_MSB | SPI_WORD_SET(8)), 1);
const struct device *spi_ctlr = DEVICE_DT_GET(DT_NODELABEL(lpspi1));
#define XFER_SIZE 2
uint8_t cmd[XFER_SIZE] = {0xaa, 0x55};
uint8_t resp[XFER_SIZE];
const struct spi_buf tx_bufs[] = {
{
.buf = cmd,
.len = sizeof(cmd),
},
};
const struct spi_buf_set tx = {
.buffers = tx_bufs,
.count = ARRAY_SIZE(tx_bufs),
};
const struct spi_buf rx_bufs[] = {
{
.buf = NULL,
.len = sizeof(cmd),
},
{
.buf = resp,
.len = sizeof(resp),
},
};
const struct spi_buf_set rx = {
.buffers = rx_bufs,
.count = ARRAY_SIZE(rx_bufs),
};
int main(void)
{
if (!spi_ctlr) {
printf("SPI device not found");
return 1;
};
if (!spi_cfg.cs.gpio.port) {
printf("Using native CS\n");
} else {
printf("Using GPIO CS\n");
}
spi_transceive(spi_ctlr, &spi_cfg, &tx, &rx);
printf("Transmission finished.\n");
return 0;
}
With DT overlay
#include <zephyr/dt-bindings/gpio/gpio.h>
#include <zephyr/dt-bindings/spi/spi.h>
&pinmux_lpspi1 {
group0 {
pinmux =
<&iomuxc_gpio_ad_b0_11_gpio1_io11>,
<&iomuxc_gpio_ad_b0_10_lpspi1_sck>,
<&iomuxc_gpio_ad_b0_11_lpspi1_pcs0>,
<&iomuxc_gpio_ad_b0_12_lpspi1_sdo>,
<&iomuxc_gpio_ad_b0_13_lpspi1_sdi>;
drive-strength = "r0-6";
slew-rate = "slow";
nxp,speed = "100-mhz";
};
};
&lpspi1 {
cs-gpios = <&gpio1 11 GPIO_ACTIVE_LOW>;
dummy_spi_dev: dummy_spi_dev@0 {
compatible = "spi-device";
reg = <0>;
duplex = <SPI_FULL_DUPLEX>;
spi-max-frequency = <100000>;
};
};
And I observed the behavior to be the same whether or not the cs-gpios property was commented out of the DT or not. The behavior of both versions being that the CS is deasserted between the parts of the transfer.
Please someone who had this issue please confirm if I missed something in this code or if the behavior did indeed change in the last 5 years, or maybe the issue is specific to only some platforms
Please someone who had this issue please confirm if I missed something in this code or if the behavior did indeed change in the last 5 years, or maybe the issue is specific to only some platforms
I can try to reproduce this when I return from Seattle. My original finding was on the twr_ke18f
board.
@henrikbrixandersen to be clear, I am finding the wrong behavior now with both gpios and native
@henrikbrixandersen to be clear, I am finding the wrong behavior now with both gpios and native
... sorry about this, I just returned to this in the spirit of LTS bug fixing and realized I accidentally left the pin muxed to the native CS before when I tested the GPIO (both mux options were in the overlay), now I see the difference, woops
@henrikbrixandersen @pdgendt I found the cause of the problem, but fixing it requires relatively major changes, can you describe the reasons why do you want to use the native chip select instead of GPIO so we can determine priority of this
@henrikbrixandersen @pdgendt I found the cause of the problem, but fixing it requires relatively major changes, can you describe the reasons why do you want to use the native chip select instead of GPIO so we can determine priority of this
We have been using GPIO CS until now because of this issue, but they result in an increased latency compared to using the controller provided CS lines.
@henrikbrixandersen @pdgendt I found the cause of the problem, but fixing it requires relatively major changes, can you describe the reasons why do you want to use the native chip select instead of GPIO so we can determine priority of this
We have been using GPIO CS until now because of this issue, but they result in an increased latency compared to using the controller provided CS lines.
I am assuming you are using the synchronous API, so by latency do you mean the transceive function cpu time is significantly longer because of the need to control GPIO CS at the beginning and end of the transfer?
I am assuming you are using the synchronous API, so by latency do you mean the transceive function cpu time is significantly longer because of the need to control GPIO CS at the beginning and end of the transfer?
Yes. The controlling the GPIOs at each end of the transaction takes up a significant portion of the total transaction time.
We've learnt to work around it. It would be nice to see it fixed, but it no longer has high priority for us.
I am assuming you are using the synchronous API, so by latency do you mean the transceive function cpu time is significantly longer because of the need to control GPIO CS at the beginning and end of the transfer?
Yes. The controlling the GPIOs at each end of the transaction takes up a significant portion of the total transaction time.
We've learnt to work around it. It would be nice to see it fixed, but it no longer has high priority for us.
Okay, one more question for you, as I am trying to figure out what the behaviour is supposed to be according to the (relatively undocumented) zephyr API: would you expect the chip select to deassert at the end of all the buffers passed to spi_transceive
being clocked in/out, or after clocking out the the number of bits specified in the data frame size of the struct spi_config . operation
flag?
Okay, one more question for you, as I am trying to figure out what the behaviour is supposed to be according to the (relatively undocumented) zephyr API: would you expect the chip select to deassert at the end of all the buffers passed to
spi_transceive
being clocked in/out, or after clocking out the the number of bits specified in the data frame size of thestruct spi_config . operation
flag?
I would expect the chip select to be deasserted only after all buffers in a transaction has been clocked in/out.
Okay, one more question for you, as I am trying to figure out what the behaviour is supposed to be according to the (relatively undocumented) zephyr API: would you expect the chip select to deassert at the end of all the buffers passed to
spi_transceive
being clocked in/out, or after clocking out the the number of bits specified in the data frame size of thestruct spi_config . operation
flag?I would expect the chip select to be deasserted only after all buffers in a transaction has been clocked in/out.
Okay, but if we take that to be the contract of the zephyr spi_transceive api, then I am wondering what is the purpose of the data frame size in the operation field of the spi_config.
To me it seems like the proper use of the api is to set the SPI_HOLD_ON_CS bit in the operation field of the spi_config struct, then release it with the spi_release api. I think otherwise it seems like the chip select delimits the amount of bits specified in the frame size. With the hold_on_cs bit set, then I guess as far as I can see the size of the frame is only seen to determine the appropriate return value in spi_transceive? Maybe @tbursztyka can help explain to me what is the expected behavior here. Either way the lpspi driver is currently implemented wrong.
Hold on CS is for keeping the CS line asserted after the transaction (or across multiple transactions).
Okay, one more question for you, as I am trying to figure out what the behaviour is supposed to be according to the (relatively undocumented) zephyr API: would you expect the chip select to deassert at the end of all the buffers passed to
spi_transceive
being clocked in/out, or after clocking out the the number of bits specified in the data frame size of thestruct spi_config . operation
flag?I would expect the chip select to be deasserted only after all buffers in a transaction has been clocked in/out.
Okay, but if we take that to be the contract of the zephyr spi_transceive api, then I am wondering what is the purpose of the data frame size in the operation field of the spi_config.
To me it seems like the proper use of the api is to set the SPI_HOLD_ON_CS bit in the operation field of the spi_config struct, then release it with the spi_release api. I think otherwise it seems like the chip select delimits the amount of bits specified in the frame size. With the hold_on_cs bit set, then I guess as far as I can see the size of the frame is only seen to determine the appropriate return value in spi_transceive? Maybe @tbursztyka can help explain to me what is the expected behavior here. Either way the lpspi driver is currently implemented wrong.
As @henrikbrixandersen mentioned, CS line has to be asserted for the whole transaction. The buffers given to spi_transceive() are scatter-gather type, so the overall anyway represent one and only one transaction.
(The data frame size name in the documentation relates to the word size. This is mandatory to know how to interpret the buffers and how to r/w to/from the controller. The SPI_HOLD_ON_CS is a specific configuration bit where you can request the CS line to stay asserted after the spi_transceive() call. There were a few use case supporting that feature back then.)
Okay, one more question for you, as I am trying to figure out what the behaviour is supposed to be according to the (relatively undocumented) zephyr API: would you expect the chip select to deassert at the end of all the buffers passed to
spi_transceive
being clocked in/out, or after clocking out the the number of bits specified in the data frame size of thestruct spi_config . operation
flag?I would expect the chip select to be deasserted only after all buffers in a transaction has been clocked in/out.
Okay, but if we take that to be the contract of the zephyr spi_transceive api, then I am wondering what is the purpose of the data frame size in the operation field of the spi_config. To me it seems like the proper use of the api is to set the SPI_HOLD_ON_CS bit in the operation field of the spi_config struct, then release it with the spi_release api. I think otherwise it seems like the chip select delimits the amount of bits specified in the frame size. With the hold_on_cs bit set, then I guess as far as I can see the size of the frame is only seen to determine the appropriate return value in spi_transceive? Maybe @tbursztyka can help explain to me what is the expected behavior here. Either way the lpspi driver is currently implemented wrong.
As @henrikbrixandersen mentioned, CS line has to be asserted for the whole transaction. The buffers given to spi_transceive() are scatter-gather type, so the overall anyway represent one and only one transaction.
(The data frame size name in the documentation relates to the word size. This is mandatory to know how to interpret the buffers and how to r/w to/from the controller. The SPI_HOLD_ON_CS is a specific configuration bit where you can request the CS line to stay asserted after the spi_transceive() call. There were a few use case supporting that feature back then.)
Okay, thanks, I think I was confused because in the LPSPI hardware, a word, frame, and transfer are 3 different things, then the NXP SDK has a different meaning of what a "transfer" is, and then the zephyr API has a different definition for these things as well than those. In the case of the LPSPI hardware, a "word" is how wide the writes to the transmit register are, whereas a "frame" consists of anywhere between 8-4K bits and therefore potentially multiple writes. I see the maximum data frame size in zephyr is 64 bits, is this because it's expected the register width will be up to 64 bits on some platforms? My question basically is, when I implement this driver, based on what I described, would it make sense to correlate the zephyr frame size to the LPSPI word size, and choose the lpspi "frame" size to be whatever is most convenient? Does the data frame size in the zephyr API impose/imply any structure on the contents of the spi buffers?
BTW I reread some of the comments in the header around the HOLD_ON_CS bit and what you are saying about it does make sense now when I read it like that. Again, I have been swimming in competing definitions of the same terms, so I was slightly confused at first.
I see the maximum data frame size in zephyr is 64 bits, is this because it's expected the register width will be up to 64 bits on some platforms? My question basically is, when I implement this driver, based on what I described, would it make sense to correlate the zephyr frame size to the LPSPI word size, and choose the lpspi "frame" size to be whatever is most convenient? Does the data frame size in the zephyr API impose/imply any structure on the contents of the spi buffers?
The dfs size is all about the spi device you are dealing with and what the controller can support. When this API was designed, there were no identified devices or controllers able to deal with more than 32bits dfs. And I think on the controller's side I haven't seen any supporting 64bits since. Many are stuck to 8bits only even. It does affect how you structure the buffers yes. An hypothetical 4bits dfs protocol on a spi device would require to use only 4 bits per-buffer byte. You cannot cram 2 frames on one byte, the api does not support such optimization (it would most likely require to shift the byte in the driver before giving it to the controller, it's a waste of time). And anyway since we make spi device drivers to be portable, they are using the most common controller dfs supported and buffers are meant to follow this. Actually, I quickly looked, there is only one driver on a spi device that uses 16 bits as word size.
I see the maximum data frame size in zephyr is 64 bits, is this because it's expected the register width will be up to 64 bits on some platforms? My question basically is, when I implement this driver, based on what I described, would it make sense to correlate the zephyr frame size to the LPSPI word size, and choose the lpspi "frame" size to be whatever is most convenient? Does the data frame size in the zephyr API impose/imply any structure on the contents of the spi buffers?
The dfs size is all about the spi device you are dealing with and what the controller can support. When this API was designed, there were no identified devices or controllers able to deal with more than 32bits dfs. And I think on the controller's side I haven't seen any supporting 64bits since. Many are stuck to 8bits only even. It does affect how you structure the buffers yes. An hypothetical 4bits dfs protocol on a spi device would require to use only 4 bits per-buffer byte. You cannot cram 2 frames on one byte, the api does not support such optimization (it would most likely require to shift the byte in the driver before giving it to the controller, it's a waste of time). And anyway since we make spi device drivers to be portable, they are using the most common controller dfs supported and buffers are meant to follow this. Actually, I quickly looked, there is only one driver on a spi device that uses 16 bits as word size.
Given this, can I make the assumption that the data frames will be multiples of 8 bits? Because that would greatly simplify the implementation. LPSPI supports odd numbers of bits to be in the frame all the way up to 4K as I mentioned, and our HAL driver has a lot of control logic to account for this, if I can just make the assumption that the data frame / word size in the buffers from the zephyr API will be 8, 16, or 32 bits that would greatly simplify things. Is it expected by for the dfs in the operation field to ever be something like 9, 15, 27 or anything weird like this in zephyr?
BTW, the minimum word size for the LPSPI is 2 bits :o
in practice expect multiple of 8bits yes. Many controllers or HAL, afaik, do not propose the possibility of a finer grain config of the word size. It was meant to be flexible, but the reality is that if you want to make portable code you need to go for the most commonly understood config.
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
I'm encountering the same issue as I described here https://github.com/zephyrproject-rtos/zephyr/discussions/77999. Any suggestions on how to resolve it ?
I'm encountering the same issue as I described here #77999. Any suggestions on how to resolve it ?
NXP is planning to contribute some rework of the lpspi driver soon, there is no quick fix, the existing driver just doesn't meet the zephyr api and is broken in a lot of ways. This issue is actually boiling over lately on a lot of fronts so it will be addressed soon
Thanks for your response, @decsny. I hope this issue gets resolved soon.
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
Hey,
Could you please let us know when NXP plans to address this issue? Currently, the driver defaults to relying on the hardware CS mode, which may not function as expected for new users attempting to utilize it.
Thanks.
Describe the bug The MCUX LPSPI SPI driver handles SPI chip selects inconsistently when comparing GPIO CS and "native" controller CS handling over multipart transfers.
With GPIO CS enabled (where the LPSPI controller does not control the CS line), the CS line is kept asserted through the entire transfer. This is opposed to what happens when the CS line is controlled by the LPSPI controller itself; then the CS line is deasserted between the different parts of the transfer.
To Reproduce
Expected behavior The CS line remains asserted through all the parts of the multipart transfer.
Impact Deasserting the CS line in the middle of a transfer causes problem e.g. when communicating with SPI EEPROMs.
Screenshots or console output When using GPIO CS:
When using LPSPI CS:
Environment (please complete the following information):
Additional context This was spotted when trying to use a SPI EEPROM with the TWR-KE18F board, but it is not limited to that board nor to the KE1xF SoC series, as far as I can tell.