nqtronix / fifofast

A fast, generic fifo for MCUs.
MIT License
38 stars 10 forks source link

Special Considerations for Very Large FIFOs? #4

Closed matlabuser77 closed 2 years ago

matlabuser77 commented 2 years ago

Are there any special considerations that should be kept in mind if attempting to declare and use a very large (e.g. 262144 elements) FIFO? Do I need to be using any low-level compiler instructions, like __DSB(); or __ISB(); or cache-flushing operations when working with the fifofast library to ensure data is handled correctly?

I'm using this fifofast library to create a FIFO to hold data to later write out to flash. I've encountered some intermittent data corruption issues in my written flash, and I am trying to narrow down where in my code that the corruption is occurring.

My issues are likely completely unrelated to my implementation of the fifofast library, but I don't have a strong enough grasp of how this library is working under the hood with relation to compiler optimizations, memory alignment, DCache/ICache, etc. to know if I'm doing something dangerous/out of spec with your library, so I thought I'd ask if there are any constraints or limitations to be aware of when using very large FIFOs.

If it helps, this is my current implementation, which seems to generally work fine:

Here is my declaration in an H file (with include guards) that is included in all my C files that access the FIFO: _fff_declare(volatile uint8_t __attribute__((aligned(16))), fifo_sram_data_storage, 262144);

(Note that I am not very confident in my alignment keyword usage here...I wasn't sure if I need to explicitly tell the compiler an alignment, but I assume it won't hurt anything either...)

Here is my initialization in main.c (but outside of main()): _fff_init(fifo_sram_data_storage);

I have a small temporary holding buffer (g_data_temp_buffer_sram[] consisting of g_current_byte elements) that I store data in, before stuffing its contents into the FIFO. To write this data to the FIFO, I first check to ensure enough free space remains using: if (g_current_byte < _fff_mem_free(fifo_sram_data_storage))

And then I write my data from my temporary buffer:

// Store in FIFO
for (ind_fifo=0;ind_fifo<g_current_byte;ind_fifo++)
{
    _fff_write_lite(fifo_sram_data_storage,g_data_temp_buffer_sram[ind_fifo]);
}

Then later I read data out from the FIFO into a buffer that I can write to my flash chip:

// Reading a single chunk of the Fast FIFO data
for (j=0;j<16;j++)
{
    // Read out the remaining bytes from the Fast FIFO into the temporary read buffer
    temp_flash_write_buffer[j] = _fff_read_lite(fifo_sram_data_storage);
}
nqtronix commented 2 years ago

This library has been written with low end micro controllers in mind, like the AVR8 series, where each bit of performance matters. Since then I've tested and used the library on Cortex M0+ and occasionally M4F devices and have so far not encountered any issues. The way I understand it, it should work well with the caches up higher-end MCUs because the data is stored linear in an array.

Because this code does not use any hardware features, __DSB();, __ISB(); and similar barrier instructions should not be needed. There's nothing inherently special about the macros, they all resolve to standard GCC-C code and the compiler ensures no data is corrupted when switching context. You only need to make sure that no fifo is accessed from to functions at once, see: Known Issures

any constraints or limitations to be aware of when using very large FIFOs

Technically there shouldn't be any constrains on size. The macros accept any data, and theoretically up to 2^64 elements of those. A practical limitation is die size, and on some large MCUs the RAM is divided into multiple banks. Even if the address space is continuous, there might be issues at the boundary between banks (but the compiler should warn you about this).

Here is my declaration in an H file (with include guards) that is included in all my C files that access the FIFO: _fff_declare(volatile uint8_t __attribute__((aligned(16))), fifo_sram_data_storage, 262144);

Manual alignment is not required for most applications. A common exception is serial to parallel conversion, where multiple bytes of a message (eg. UART) are interpreted together as a uint16_t or larger (see [https://github.com/nqtronix/fifofast#aligned-data](aligned data)). If you specify uint16_t or uint32_t directly as the datatype, the compiler will pick the ideal alignment automatically.

Your usage example looks correct to me, although I do not quite understand why you need a fifo if you buffer it anyway before writing and after reading the data. Maybe you can get away with a simple large array instead and only modify the pointer, instead of copying the data back-and-forth. If you have to copy that much data, consider using the native bus width (ie. 32bit on ARM Cortex) for the fifo data type, this improves copy speed 4x compared to the single byte approach.

Sorry for replying this late, I didn't get a email notification and have logged in today for the first time in a while :)

nqtronix commented 2 years ago

I assume you solved your issue by now and thus consider this issue closed. Please re-open if this isn't the case.