Closed cdoak closed 6 years ago
It looks like an implementation detail to me (internal header file), rather than an API. Thus the use of union is an optimisation detail of opaque state which does not affect the API contact. Q: does this implementation detail leak out into the API? Q: do you need both features to be supported in the same binary build? Q: does the binary size increase at the expense of generic source code?
I've taken Roberts advice on board and buried the implementation details from the client behind the opaque handle (via the CREATE_XXX_HANDLE) macros.
I've also implemented function pointers that are set when CREATE_XXX_HANDLE is called. This negates the need to if-the-else and switch statements in the calls, therefore reducing code size and improving speed.
Still alot of work to do, but interested to hear your thoughts now @robert-lytton , @samchesney, @pthedinger, @chrisc-xmos .
I'm debating with myself who controls the device_select (CS) port. Im leaning on the side of its NOT lib_spi.
Take lib_flash for example, it will want to output and input data all within the same CS low period. If lib_spi controlled the port there would be two toggles on CS, one for the output and another for the input. Whereas if lib_flash controlled it, it could lower the cs, perform its output followed by input and then raise it when done.
I could leave the CS port owned by lib_spi but expose an API function to raise/lower CS on its handle. So the caller has control over it.
What are your thoughts @samchesney @robert-lytton @pthedinger @chrisc-xmos ?
There may not be a CS (if the board is laid out as a daisy-chain of 1 or more devices). But I have no idea if this is something a customer is allowed to do... If you want to parameterise this behaviour use a strategy pattern viz: the caller passes in a function pointer or 'details' that will do any necessary work (or a null).
"All problems in computer science can be solved by another level of indirection" – David Wheeler "…except for the problem of too many layers of indirection" – Kevlin Henney's corollary
Hi @cdoak, I like the look of the API with the opaque handle 👍
I guess the complication of supplying shared multi-bit ports in as CS probably means passing in a function pointer as Robert suggests would be best. However as this still complicates the lib_flash requirement you highlighted it may just be best to just keep it external to the library.
Hi @cdoak, one other thing came to mind last night, I was wondering whether this API could also be used for a SPI slave implementation by passing an extra parameter to the CREATE_XXX_HANDLE()
macros?
In lib_flash, when transmitting data, I format the data first to the correct pins and then I transmit it.
With lib_spi, I'd like to wrap this formatting up, so that the client doesn't need to concern themselves with it. It will be auto-formatted based on what handle they have.
The following code shows this formatting for QSPI write bytes in SPI mode...
// Copyright (c) 2016-2018, XMOS Ltd, All rights reserved
//Write bytes for QSPI SPI mode. That is all data is output from the SIO port one
//word at a time. As all the bits are doubled up so it takes two words to make
//up the required byte. The data must be transmitted from pin IO0 (SI) of the
//4 pin port.
#if defined(__XS2A__)
#define STACKWORDS 8 //TODO: how to calculate?
#define EIGHT_CLOCK_EDGES 16 //doubled up
//Registers
#define QSPI_PORTS r0
#define TX_DATA r1
#define NUM_DATA r2
#define SCLK r3
#define SIO r4
#define CLK_EDGES r5
#define WORD_OUT_ONE r6
#define WORD_OUT_TWO r7
#define ZIP_OUT r8
#define INDEX r9
#define CLK_START_END r10
#define PORT_DELAY r8
#define PORT_TIME r11
//Data Pool for clock edges
.section .dp.data,"awd",@progbits
.align 4
.cc_top clock_edge_data.datum
clock_edges_cpol_zero:
.word 0xAAAA //8 edges (doubled up to 16)
clock_edges_cpol_one:
.word 0x5555 //8 edges (doubled up to 16)
.cc_bottom clock_edge_data.datum
.text
////////////////////////////////////////////////////////////////////////////////
//qspi_port_spi_mode_zero_write_bytes_asm
//
//Writes bytes from memory up to and including speeds of 50MHz
//
//in - internal_handle : spi_handle_t
//out - tx_bytes : char*
//in - num_bytes : unsigned
//return - void
////////////////////////////////////////////////////////////////////////////////
.cc_top qspi_port_spi_mode_zero_write_bytes_asm.function
.globl qspi_port_spi_mode_zero_write_bytes_asm
.align 4
.type qspi_port_spi_mode_zero_write_bytes_asm,@function
qspi_port_spi_mode_zero_write_bytes_asm:
.issue_mode dual
DUALENTSP_lu6 STACKWORDS
//Callee save registers: TODO: why SP[1]..SP[3]
std r4, r5, sp[1]
std r6, r7, sp[2]
std r8, r9, sp[3]
//Deducting one from NUM_DATA from the get-go to enable 1 byte reads below...
{ldw SIO, QSPI_PORTS[1]; sub NUM_DATA, NUM_DATA, 1}
//Initialising the INDEX to -1
{ldw SCLK, QSPI_PORTS[2]; mkmsk INDEX, 32}
//Mode 0 requires cpol 0 clock edges
//cpol 0 starts/finish low
{ldw CLK_EDGES, dp[clock_edges_cpol_zero]; ldc CLK_START_END, 0}
outpw res[SCLK], CLK_START_END, 1
bl qspi_port_write_bytes_asm
//Callee restore registers
ldd r4, r5, sp[1]
ldd r6, r7, sp[2]
ldd r8, r9, sp[3]
retsp STACKWORDS
.qspi_port_spi_mode_zero_write_bytes_asm_tmp:
.size qspi_port_spi_mode_zero_write_bytes_asm, .qspi_port_spi_mode_zero_write_bytes_asm_tmp - qspi_port_spi_mode_zero_write_bytes_asm
.align 4
.cc_bottom qspi_port_spi_mode_zero_write_bytes_asm.function
////////////////////////////////////////////////////////////////////////////////
//qspi_port_spi_mode_one_write_bytes_asm
//
//Writes bytes from memory up to and including speeds of 50MHz
//
//in - internal_handle : spi_handle_t
//out - tx_bytes : char*
//in - num_bytes : unsigned
//return - void
////////////////////////////////////////////////////////////////////////////////
.cc_top qspi_port_spi_mode_one_write_bytes_asm.function
.globl qspi_port_spi_mode_one_write_bytes_asm
.align 4
.type qspi_port_spi_mode_one_write_bytes_asm,@function
qspi_port_spi_mode_one_write_bytes_asm:
.issue_mode dual
DUALENTSP_lu6 STACKWORDS
//Callee save registers: TODO: why SP[1]..SP[3]
std r4, r5, sp[1]
std r6, r7, sp[2]
std r8, r9, sp[3]
//TODO:
//Callee restore registers
ldd r4, r5, sp[1]
ldd r6, r7, sp[2]
ldd r8, r9, sp[3]
retsp STACKWORDS
.qspi_port_spi_mode_one_write_bytes_asm_tmp:
.size qspi_port_spi_mode_one_write_bytes_asm, .qspi_port_spi_mode_one_write_bytes_asm_tmp - qspi_port_spi_mode_one_write_bytes_asm
.align 4
.cc_bottom qspi_port_spi_mode_one_write_bytes_asm.function
////////////////////////////////////////////////////////////////////////////////
//qspi_port_spi_mode_two_write_bytes_asm
//
//Writes bytes from memory up to and including speeds of 50MHz
//
//in - internal_handle : spi_handle_t
//out - tx_bytes : char*
//in - num_bytes : unsigned
//return - void
////////////////////////////////////////////////////////////////////////////////
.cc_top qspi_port_spi_mode_two_write_bytes_asm.function
.globl qspi_port_spi_mode_two_write_bytes_asm
.align 4
.type qspi_port_spi_mode_two_write_bytes_asm,@function
qspi_port_spi_mode_two_write_bytes_asm:
.issue_mode dual
DUALENTSP_lu6 STACKWORDS
//Callee save registers: TODO: why SP[1]..SP[3]
std r4, r5, sp[1]
std r6, r7, sp[2]
std r8, r9, sp[3]
//TODO:
//Callee restore registers
ldd r4, r5, sp[1]
ldd r6, r7, sp[2]
ldd r8, r9, sp[3]
retsp STACKWORDS
.qspi_port_spi_mode_two_write_bytes_asm_tmp:
.size qspi_port_spi_mode_two_write_bytes_asm, .qspi_port_spi_mode_two_write_bytes_asm_tmp - qspi_port_spi_mode_two_write_bytes_asm
.align 4
.cc_bottom qspi_port_spi_mode_two_write_bytes_asm.function
////////////////////////////////////////////////////////////////////////////////
//qspi_port_spi_mode_three_write_bytes_asm
//
//Writes bytes from memory up to and including speeds of 50MHz
//
//in - internal_handle : spi_handle_t
//out - tx_bytes : char*
//in - num_bytes : unsigned
//return - void
////////////////////////////////////////////////////////////////////////////////
.cc_top qspi_port_spi_mode_three_write_bytes_asm.function
.globl qspi_port_spi_mode_three_write_bytes_asm
.align 4
.type qspi_port_spi_mode_three_write_bytes_asm,@function
qspi_port_spi_mode_three_write_bytes_asm:
.issue_mode dual
DUALENTSP_lu6 STACKWORDS
//Callee save registers: TODO: why SP[1]..SP[3]
std r4, r5, sp[1]
std r6, r7, sp[2]
std r8, r9, sp[3]
//TODO:
bl qspi_port_write_bytes_asm
//Callee restore registers
ldd r4, r5, sp[1]
ldd r6, r7, sp[2]
ldd r8, r9, sp[3]
retsp STACKWORDS
.qspi_port_spi_mode_three_write_bytes_asm_tmp:
.size qspi_port_spi_mode_three_write_bytes_asm, .qspi_port_spi_mode_three_write_bytes_asm_tmp - qspi_port_spi_mode_three_write_bytes_asm
.align 4
.cc_bottom qspi_port_spi_mode_three_write_bytes_asm.function
////////////////////////////////////////////////////////////////////////////////
//qspi_port_write_bytes_asm
////////////////////////////////////////////////////////////////////////////////
.cc_top qspi_port_write_bytes_asm.function
.globl qspi_port_write_bytes_asm
.align 4
.type qspi_port_write_bytes_asm,@function
qspi_port_write_bytes_asm:
.issue_mode dual
DUALENTSP_lu6 STACKWORDS
//Format the first byte so that it is all lined up on the SI pin
add INDEX, INDEX, 1
{ld8u WORD_OUT_ONE, TX_DATA[INDEX]; ldc ZIP_OUT, 0x00}
add WORD_OUT_TWO, WORD_OUT_ONE, 0 //Duplicate
zip WORD_OUT_ONE, WORD_OUT_TWO, 0 //Double up all the bits
unzip WORD_OUT_ONE, WORD_OUT_TWO, 3 //Unzip into two words
zip ZIP_OUT, WORD_OUT_ONE, 0 //Align out to SI...
zip ZIP_OUT, WORD_OUT_TWO, 0 //...
zip ZIP_OUT, WORD_OUT_ONE, 0 //...
zip ZIP_OUT, WORD_OUT_TWO, 0 //...Align out to SI
//Line the QSPI_PORTS up to output the clock edges and start outputting
//at the required times
setc res[SIO], 0x1 //TODO: find docs on what 0x1 means for port resource type
in PORT_TIME, res[SIO]
getts PORT_TIME, res[SIO]
ldc PORT_DELAY, 0x96 //TODO: configurable port setup delay
add PORT_DELAY, PORT_TIME, PORT_DELAY
setpt res[SCLK], PORT_DELAY
setpt res[SIO], PORT_DELAY
outpw res[SCLK], CLK_EDGES, EIGHT_CLOCK_EDGES
//If there is only one byte of data to send:
//skip on down to read the final(only) byte...
bf NUM_DATA, qspi_port_spi_mode_zero_write_bytes_asm_final_byte
qspi_port_spi_mode_zero_write_bytes_asm_loop:
{out res[SIO], WORD_OUT_ONE; add INDEX, INDEX, 1}
{out res[SIO], WORD_OUT_TWO; sub NUM_DATA, NUM_DATA, 1}
//Format the next word...
{ld8u WORD_OUT_ONE, TX_DATA[INDEX]; ldc ZIP_OUT, 0x00}
add WORD_OUT_TWO, WORD_OUT_ONE, 0 //Duplicate
zip WORD_OUT_ONE, WORD_OUT_TWO, 0 //Double up all the bits
unzip WORD_OUT_ONE, WORD_OUT_TWO, 3 //Unzip into two words
zip ZIP_OUT, WORD_OUT_ONE, 0 //Align out to SI...
zip ZIP_OUT, WORD_OUT_TWO, 0 //...
zip ZIP_OUT, WORD_OUT_ONE, 0 //...
zip ZIP_OUT, WORD_OUT_TWO, 0 //...Align out to SI
//Output clock edges to carry data that will be input on the next iteration
outpw res[SCLK], CLK_EDGES, EIGHT_CLOCK_EDGES
//Go back around the loop if more data to read...
bt NUM_DATA, qspi_port_spi_mode_zero_write_bytes_asm_loop
qspi_port_spi_mode_zero_write_bytes_asm_final_byte:
//Output the final byte
out res[SIO], WORD_OUT_ONE
out res[SIO], WORD_OUT_TWO
//Wait for SCLK to complete
outpw res[SCLK], CLK_START_END, 1
syncr res[SCLK]
retsp STACKWORDS
.qspi_port_write_bytes_asm_tmp:
.size qspi_port_write_bytes_asm, .qspi_port_write_bytes_asm_tmp - qspi_port_write_bytes_asm
.align 4
.cc_bottom qspi_port_write_bytes_asm.function
#endif //defined(__XS2A__)
Unfortunately it is taking too long to get around the qspi_port_spi_mode_zero_write_bytes_asm_loop, such that when I output the next word combination for the byte, I end up with some extra zeros between these words and the last words.
Can anyone see any way to optimise this loop any more than it already is? Unfortunately the zips/unzips don't appear to be dual issuable.
My alternative is to format all the bytes into word pairs before I perform the send, however this will come at a cost of memory. That is I'd need enough space to store each byte to be sent as a word pair.
Would appreciate your thoughts @samchesney @robert-lytton @pthedinger @andrewstanfordjason
As discussed - a 16-entry lookup table to convert each nibble to a word should do the trick.
I modified the code to use the 16-entry lookup table as per @pthedinger suggestion as follows, but unfortunately it's still too slow, resulting in unwanted 0 values between each byte sent...
.cc_top qspi_port_write_bytes_asm.function
.globl qspi_port_write_bytes_asm
.align 4
.type qspi_port_write_bytes_asm,@function
lut_align_byte_to_si_pin:
.word 0x00000000 //0x0
.word 0x00000011 //0x1
.word 0x00001100 //0x2
.word 0x00001111 //0x3
.word 0x00110000 //0x4
.word 0x00110011 //0x5
.word 0x00111100 //0x6
.word 0x00111111 //0x7
.word 0x11000000 //0x8
.word 0x11000011 //0x9
.word 0x11001100 //0xA
.word 0x11001111 //0xB
.word 0x11110000 //0xC
.word 0x11110011 //0xD
.word 0x11111100 //0xE
.word 0x11111111 //0xF
qspi_port_write_bytes_asm:
.issue_mode dual
DUALENTSP_lu6 STACKWORDS
//Format the first byte so that it is all lined up on the SI pin
{add INDEX, INDEX, 1; mkmsk BYTE_MASK, 0x4}
{ld8u BYTE_OUT, TX_DATA[INDEX]; ldap LUT_SI_PIN, lut_align_byte_to_si_pin}
{shr WORD_OUT_ONE, BYTE_OUT, 0x4; and WORD_OUT_TWO, BYTE_OUT, BYTE_MASK}
ldw WORD_OUT_ONE, LUT_SI_PIN[WORD_OUT_ONE]
ldw WORD_OUT_TWO, LUT_SI_PIN[WORD_OUT_TWO]
//Line the QSPI_PORTS up to output the clock edges and start outputting
//at the required times
setc res[SIO], 0x1 //TODO: find docs on what 0x1 means for port resource type
in PORT_TIME, res[SIO]
getts PORT_TIME, res[SIO]
ldc PORT_DELAY, 0x96 //TODO: configurable port setup delay
add PORT_DELAY, PORT_TIME, PORT_DELAY
setpt res[SCLK], PORT_DELAY
setpt res[SIO], PORT_DELAY
outpw res[SCLK], CLK_EDGES, EIGHT_CLOCK_EDGES
//If there is only one byte of data to send:
//skip on down to read the final(only) byte...
bf NUM_DATA, qspi_port_spi_mode_zero_write_bytes_asm_final_byte
qspi_port_spi_mode_zero_write_bytes_asm_loop:
{out res[SIO], WORD_OUT_ONE; add INDEX, INDEX, 1}
{out res[SIO], WORD_OUT_TWO; sub NUM_DATA, NUM_DATA, 1}
//Format the next byte so that it is all lined up on the SI pin
{ld8u BYTE_OUT, TX_DATA[INDEX]; mkmsk BYTE_MASK, 0x4}
{shr WORD_OUT_ONE, BYTE_OUT, 0x4; and WORD_OUT_TWO, BYTE_OUT, BYTE_MASK}
ldw WORD_OUT_ONE, LUT_SI_PIN[WORD_OUT_ONE]
ldw WORD_OUT_TWO, LUT_SI_PIN[WORD_OUT_TWO]
//Output clock edges to carry data that will be input on the next iteration
outpw res[SCLK], CLK_EDGES, EIGHT_CLOCK_EDGES
//Go back around the loop if more data to read...
bt NUM_DATA, qspi_port_spi_mode_zero_write_bytes_asm_loop
qspi_port_spi_mode_zero_write_bytes_asm_final_byte:
//Output the final byte
out res[SIO], WORD_OUT_ONE
{out res[SIO], WORD_OUT_TWO; ldw CLK_START_END, sp[4]}
//Wait for SCLK to complete
outpw res[SCLK], CLK_START_END, 1
syncr res[SCLK]
retsp STACKWORDS
.qspi_port_write_bytes_asm_tmp:
.size qspi_port_write_bytes_asm, .qspi_port_write_bytes_asm_tmp - qspi_port_write_bytes_asm
.align 4
.cc_bottom qspi_port_write_bytes_asm.function
I'll try @pthedinger other suggestion of using an additional thread to perform the formatting next.
The lookup tables was fast enough. It was my mistake not realising that the bits were reversed that was causing the issue.
The code for the function...
void flash_read_jedec_id(const flash_handle_t * const flash_handle,
char jedec[num_bytes],
const size_t num_bytes)
{
unsafe
{
flash_handle_impl_t *unsafe internal_handle = (flash_handle_impl_t *unsafe)flash_handle;
#if defined(XASSERT_ENABLE_ASSERTIONS)
validate_divides(func_flash_read_jedec_id, flash_handle);
#endif //defined(XASSERT_ENABLE_ASSERTIONS)
//Build the flash_command_read_jedec
unsigned command_out[2];
{command_out[0], command_out[1]} = build_sio_pin0_data(flash_command_read_jedec);
//Send the command out on a slower clkblk
switch_sio_to_output(internal_handle);
//Drive the CS low. CS is clocked off the default reference clock and must be
//lowered now to meet setup timings.
drive_chip_select(internal_handle, chip_select_drive_low);
//input time from the port;
unsigned time_stamp;
internal_handle->flash_ports.flash_sio :> void @ time_stamp;
time_stamp += PORT_SETUP_DELAY;
//Line everything up so that we output the command and the clock edges
//at the same time. 8 clock edges are required to complete this.
internal_handle->flash_ports.flash_sclk @ time_stamp <: CLOCK_EDGES_8;
internal_handle->flash_ports.flash_sio @ time_stamp <: command_out[0]; //Command
internal_handle->flash_ports.flash_sio <: command_out[1]; //Command
partout(internal_handle->flash_ports.flash_sclk, 1, 0); //finish sclk low
sync(internal_handle->flash_ports.flash_sclk); // ensure all output completes
//Now turn the SIO port around.
switch_sio_to_input(internal_handle);
//Input the current port time again
internal_handle->flash_ports.flash_sio :> void @ time_stamp;
time_stamp += PORT_SETUP_DELAY;
//Output our flash clock edges - 8 cycles for the first jedec byte
partout_timed(internal_handle->flash_ports.flash_sclk, 16, CLOCK_EDGES_8, time_stamp);
asm("setpt res[%0], %1"::"r"(internal_handle->flash_ports.flash_sio),"r"(time_stamp + internal_handle->flash_clock_delays.flash_byte_input_delay));
if(num_bytes == 1)
{
jedec[0] = read_byte_asm(internal_handle->flash_ports);
}
else
{
read_bytes_asm(internal_handle->flash_ports, jedec, num_bytes);
}
//Drive the CS back hi now that everything is complete
drive_chip_select(internal_handle, chip_select_drive_high);
}
}
Will reduce to something like follows when it switches over to use lib_spi in this refactors...
reconfigure_sio_port();
drive_cs_lo();
spi_tx_bytes(&spi_handle, transaction_type_spi, jedec_command, 1);
reconfigure_sio_port();
spi_rx_bytes(&spi_handle, transaction_type_spi, jedec_id, 3);
Hi @robert-lytton. Can I just ask you to check that I have understood the fptgroup attribute and therefore used it correctly in my last commit (https://github.com/xmos/lib_spi/pull/22/commits/a8a97525ecf78a420358346878364f690a4943cd)
Many Thanks
Merging my pull request thus far so my work isn't lost when my account is deleted.
I have started mocking up some code that may be the replacement API for lib_spi where it now also incorporates the bus functions for lquad_spi.
In this first attempt, I have focused on the ability to maintain a single API interface, but pass different port structures based on SPI/QSPI requirements. I have achieved this through the use of a union. I'm interested to hear your thoughts on this vs the obvious route of having two API headers (xmos_spi.h and xmos_qspi.h)
Colin.