Open kstrohmayer opened 1 year ago
Just one correction: if what we discussed on Thursday hasn't changed, then I will fix the RTL for the custom instruction cntb
(#10) and Adam will rework the C library in parallel.
Can you assign Adam to this task then?
Pefectly fine for me.
@kstrohmayer @mole99 After our discussion with Klaus, we've decided to investigate the root cause of the encoding errors before making significant algorithm changes. I suspect there may be an issue with the 'prepare_data' function, the data buffer, or where 'rle_compress' fetches the signal for encoding, or perhaps a combination of these two. Even when I feed various random data bits into the 'prepare_data' buffer, the output remains the same, as shown in the attached picture from the first five test runs. It's puzzling that the counted consecutive bits are often fewer than the bits in 'bits_rle_block,' preventing compression. While I'm not entirely certain about the correctness, you can find the output in the attached picture.
I have a separate repository for RLE algorithm testing in Visual Studio. If you want to run a test while I'm away, just check out the 'rle_encoding' branch and run the 'rle.vcxproj' file. This should open Visual Studio for debugging. If you encounter issues, it might be due to the project not being in Console mode, and you can refer to this article for help: https://www.codeproject.com/Questions/5289599/How-do-I-resolve-my-LNK2019-error-in-my-program
One of the reasons the data in the buffer wasn't changing correctly is that Mario was rewriting the content of the buffer with strcpy((char*)data, "Hello World!");
before passing it to the rle_compress
function. I have uploaded some of the results to Google Drive. The results from 0-9 bypass the part of rewriting the buffer. Results from 10-19 use Mario's original code.
Even though I am generating random data bits for each iteration, I only observe some minor changes in the data buffer. However, the data being sent to the rle_compress
function remains consistently the same.
Even though I am generating random data bits for each iteration, I only observe some minor changes in the data buffer. However, the data being sent to the
rle_compress
function remains consistently the same.
Might be the case because he only changes the "data word" of the SPI transfer he emulates.
One of the reasons the data in the buffer wasn't changing correctly is that Mario was rewriting the content of the buffer with
strcpy((char*)data, "Hello World!");
before passing it to therle_compress
function. I have uploaded some of the results to Google Drive. The results from 0-9 bypass the part of rewriting the buffer. Results from 10-19 use Mario's original code.
Does it work now?
Unfortunately, not yet. The prepare_data function uses a data pointer as a buffer to store the clock, sync, data, and clear signals. However, from my tests, it seems like the data buffer is not handled correctly and doesn't return the correct data to the compression algorithm..
---- RLE test run: 0 ----
---- Prepare Data ----
Data bits: 0x3b37
DAC data: 0x103b370
Empty Data Buffer: 00000053798FF6E8
--- Print Data from buffer ---
Data: 00000053798FF6E8
Is the required memory for the data buffer allocated?
Is the required memory for the data buffer allocated?
Not in the original code. I did add a memory allocation before calling the prepare data function data[DATA_BYTE_SIZE] = malloc(sizeof(uint8_t));
but I haven't seen any difference in the results.
The cntb_test
skips most of the RLE algorithm by not using the prepare_data
function. It generates random values and start positions, successfully returning consecutive bits, functioning well in both software and hardware. However, when I ran the rle_test
with the same input data as the cntb_test
, I obtained different results. It seems the rle_compression
function is not handling the input data correctly, leaving most bits uncompressed. I recommend a code review and may need additional support. Test cntb and rle results are available on Google Drive.
Hi Adam, Let's do a call on Monday morning. I'm travelling with Gottfried to Photeon by car. So I don't know exactly when I can do the call. I'll ping you.
Hi Adam, Let's do a call on Monday morning. I'm travelling with Gottfried to Photeon by car. So I don't know exactly when I can do the call. I'll ping you.
That's fine, but I think it would be better to do this either in person or when I can share my screen so we could go through my test results.
Update: Moved the comment. I mistakenly posted this comment to the wrong issue.
Summary of changes I have made improvements to the data preparation and loading functions to address issues with loading the data buffer for the encoding process.
There were multiple issues in the encoding process, including incorrect count values, faulty buffer loading, and inaccuracies in tracking the start position and bit values. The incorrect max count values left most signals uncompressed, and the uncompressed signals weren't loaded correctly into the buffer. Inaccuracies in the start position were leading to missing bits and incorrect count values. The handling of last bits between blocks also caused further errors in the count values.
To address these issues, I made changes to the counting function, allowing us to count a total of 32 values, from 0 to 31. The tracking of the start position for cntb was also changed to correctly read all bits for each signal and block. It can now accurately count consecutive bits, determine if the signal is compressed, and update the bit values accordingly.
However, there's a potential issue with the read function. This function returns the encoded data from a bitstream. While it correctly returns the count value, there might be an error in retrieving the bit value and the not_compressed value. This error could affect the decoding process as well. My next steps involve investigating whether the problem lies within the read function or the way we handle the encoded data buffer.
I have updated the RLE algorithm to achieve a better compression ratio. Currently, the algorithm no longer relies on the not_compressed flag for storing or decompressing data accurately. This modification has led to a reduced number of bits stored in the bitstream/memory, resulting in an improved compression ratio. With the given sample data, the following reductions have been achieved:
It's important to note that these results are subjective, as they depend on the consecutive bits found in each signal. The CLK signal demonstrates the best result due to its all-one's nature.
The current implementation of the RLE algorithm, running on the ULX3S FPGA, gives the following results. Measured with Saleae Logic 8 logic analyser. In the following test cases, only the DATA signal differes, to make the comparison easier. In test case three, the signal changed rapidly, leading to less effective compression by the algorithm, hence the increase in size. This also resulted in an increased run time and a lower improvement figure between the hardware and software implementations.
Test Case 1 | Signal | Uncompressed bit number | Compressed bit number | Reduction in Size |
---|---|---|---|---|
CLK | 64 | 12 | 81.25% | |
SYNC | 64 | 18 | 71.88% | |
DATA | 64 | 42 | 34.88% | |
CLR | 64 | 18 | 71.88% | |
Overall | 64 | 12 | 64.84% | |
Performance Improvement | With Custom Instruction | Without Custom Instruction | / | |
RLE Run time (ms) | 66.05704 | 74.02218 | / | |
Improvement in ms | 7.96514 | / | / | |
Improvement in percentage | 10.76% | / | / |
Test Case 2 | Signal | Uncompressed bit number | Compressed bit number | Reduction in Size |
---|---|---|---|---|
CLK | 64 | 12 | 81.25% | |
SYNC | 64 | 18 | 71.88% | |
DATA | 64 | 45 | 29.69% | |
CLR | 64 | 18 | 71.88% | |
Overall | 64 | 12 | 64.84% | |
Performance Improvement | With Custom Instruction | Without Custom Instruction | / | |
RLE Run time (ms) | 66.84003 | 74.83414 | / | |
Improvement in ms | 7.99411 | / | / | |
Improvement in percentage | 10.68% | / | / |
Test Case 3 | Signal | Uncompressed bit number | Compressed bit number | Reduction in Size |
---|---|---|---|---|
CLK | 64 | 12 | 81.25% | |
SYNC | 64 | 18 | 71.88% | |
DATA | 64 | 73 | -14.06% | |
CLR | 64 | 18 | 71.88% | |
Overall | 64 | 12 | 64.84% | |
Performance Improvement | With Custom Instruction | Without Custom Instruction | / | |
RLE Run time (ms) | 80.85817 | 89.21094 | / | |
Improvement in ms | 8.35277 | / | / | |
Improvement in percentage | 9.36% | / | / |
CC: @kstrohmayer @mole99 @adam-hrvth
Refactoring of actual C code for testing the custom instruction.
[x] Implement without instruction set extensions
[ ] Implement with instruction set extensions
Status
Actual status of the block wise run length algorithm.
C-implementation running on the PC
Implementation without any custom instruction. A function is considered working if it is verified with a self-checking test running at least 10 times with different data.
C-implementation running on the CV32E40X using FPGA
tbd