semify-eda / tristan

1 stars 0 forks source link

Refactoring of C code for encoding and decoding #8

Open kstrohmayer opened 1 year ago

kstrohmayer commented 1 year ago

CC: @kstrohmayer @mole99 @adam-hrvth

Refactoring of actual C code for testing the custom instruction.

Status

Actual status of the block wise run length algorithm.

C-implementation running on the PC

Implementation without any custom instruction. A function is considered working if it is verified with a self-checking test running at least 10 times with different data.

Coding Status - Adam Status - Leo
Encoding fixed number of signals (4), fixed number of samples (64) completed ?
Decoding fixed number of signals (4), fixed number of samples (64) completed ?
Encoding variable number of signals - signal width is integer divider of 8 (2,4,8,16), fixed number of samples (64) ? ?
Decoding variable number of signals - signal width is integer divider of 8 (2,4,8,16), fixed number of samples (64) ? ?
Encoding variable number of signals - signal width is integer divider of 8 (2,4,8,16), variable number of samples - samples fit fully in a 32bit word (16, 32, 64) ? ?
Decoding variable number of signals - signal width is integer divider of 8 (2,4,8,16), variable number of samples - samples fit fully in a 32bit word (16, 32, 64) ? ?

C-implementation running on the CV32E40X using FPGA

tbd

mole99 commented 1 year ago

Just one correction: if what we discussed on Thursday hasn't changed, then I will fix the RTL for the custom instruction cntb (#10) and Adam will rework the C library in parallel.

Can you assign Adam to this task then?

kstrohmayer commented 1 year ago

Pefectly fine for me.

adam-hrvth commented 1 year ago

@kstrohmayer @mole99 After our discussion with Klaus, we've decided to investigate the root cause of the encoding errors before making significant algorithm changes. I suspect there may be an issue with the 'prepare_data' function, the data buffer, or where 'rle_compress' fetches the signal for encoding, or perhaps a combination of these two. Even when I feed various random data bits into the 'prepare_data' buffer, the output remains the same, as shown in the attached picture from the first five test runs. It's puzzling that the counted consecutive bits are often fewer than the bits in 'bits_rle_block,' preventing compression. While I'm not entirely certain about the correctness, you can find the output in the attached picture.

I have a separate repository for RLE algorithm testing in Visual Studio. If you want to run a test while I'm away, just check out the 'rle_encoding' branch and run the 'rle.vcxproj' file. This should open Visual Studio for debugging. If you encounter issues, it might be due to the project not being in Console mode, and you can refer to this article for help: https://www.codeproject.com/Questions/5289599/How-do-I-resolve-my-LNK2019-error-in-my-program

rle_test1

adam-hrvth commented 1 year ago

One of the reasons the data in the buffer wasn't changing correctly is that Mario was rewriting the content of the buffer with strcpy((char*)data, "Hello World!"); before passing it to the rle_compress function. I have uploaded some of the results to Google Drive. The results from 0-9 bypass the part of rewriting the buffer. Results from 10-19 use Mario's original code.

Even though I am generating random data bits for each iteration, I only observe some minor changes in the data buffer. However, the data being sent to the rle_compress function remains consistently the same.

kstrohmayer commented 1 year ago

Even though I am generating random data bits for each iteration, I only observe some minor changes in the data buffer. However, the data being sent to the rle_compress function remains consistently the same.

Might be the case because he only changes the "data word" of the SPI transfer he emulates.

kstrohmayer commented 1 year ago

One of the reasons the data in the buffer wasn't changing correctly is that Mario was rewriting the content of the buffer with strcpy((char*)data, "Hello World!"); before passing it to the rle_compress function. I have uploaded some of the results to Google Drive. The results from 0-9 bypass the part of rewriting the buffer. Results from 10-19 use Mario's original code.

Does it work now?

adam-hrvth commented 1 year ago

Unfortunately, not yet. The prepare_data function uses a data pointer as a buffer to store the clock, sync, data, and clear signals. However, from my tests, it seems like the data buffer is not handled correctly and doesn't return the correct data to the compression algorithm..

---- RLE test run: 0 ----

---- Prepare Data ----
Data bits: 0x3b37
DAC data: 0x103b370
Empty Data Buffer: 00000053798FF6E8

--- Print Data from buffer ---
Data: 00000053798FF6E8
kstrohmayer commented 1 year ago

Is the required memory for the data buffer allocated?

adam-hrvth commented 1 year ago

Is the required memory for the data buffer allocated?

Not in the original code. I did add a memory allocation before calling the prepare data function data[DATA_BYTE_SIZE] = malloc(sizeof(uint8_t)); but I haven't seen any difference in the results.

adam-hrvth commented 1 year ago

The cntb_test skips most of the RLE algorithm by not using the prepare_data function. It generates random values and start positions, successfully returning consecutive bits, functioning well in both software and hardware. However, when I ran the rle_test with the same input data as the cntb_test, I obtained different results. It seems the rle_compression function is not handling the input data correctly, leaving most bits uncompressed. I recommend a code review and may need additional support. Test cntb and rle results are available on Google Drive.

kstrohmayer commented 1 year ago

Hi Adam, Let's do a call on Monday morning. I'm travelling with Gottfried to Photeon by car. So I don't know exactly when I can do the call. I'll ping you.

adam-hrvth commented 12 months ago

Hi Adam, Let's do a call on Monday morning. I'm travelling with Gottfried to Photeon by car. So I don't know exactly when I can do the call. I'll ping you.

That's fine, but I think it would be better to do this either in person or when I can share my screen so we could go through my test results.

adam-hrvth commented 11 months ago

Update: Moved the comment. I mistakenly posted this comment to the wrong issue.

Summary of changes I have made improvements to the data preparation and loading functions to address issues with loading the data buffer for the encoding process.

There were multiple issues in the encoding process, including incorrect count values, faulty buffer loading, and inaccuracies in tracking the start position and bit values. The incorrect max count values left most signals uncompressed, and the uncompressed signals weren't loaded correctly into the buffer. Inaccuracies in the start position were leading to missing bits and incorrect count values. The handling of last bits between blocks also caused further errors in the count values.

To address these issues, I made changes to the counting function, allowing us to count a total of 32 values, from 0 to 31. The tracking of the start position for cntb was also changed to correctly read all bits for each signal and block. It can now accurately count consecutive bits, determine if the signal is compressed, and update the bit values accordingly.

However, there's a potential issue with the read function. This function returns the encoded data from a bitstream. While it correctly returns the count value, there might be an error in retrieving the bit value and the not_compressed value. This error could affect the decoding process as well. My next steps involve investigating whether the problem lies within the read function or the way we handle the encoded data buffer.

adam-hrvth commented 10 months ago

I have updated the RLE algorithm to achieve a better compression ratio. Currently, the algorithm no longer relies on the not_compressed flag for storing or decompressing data accurately. This modification has led to a reduced number of bits stored in the bitstream/memory, resulting in an improved compression ratio. With the given sample data, the following reductions have been achieved:

It's important to note that these results are subjective, as they depend on the consecutive bits found in each signal. The CLK signal demonstrates the best result due to its all-one's nature.

adam-hrvth commented 10 months ago

The current implementation of the RLE algorithm, running on the ULX3S FPGA, gives the following results. Measured with Saleae Logic 8 logic analyser. In the following test cases, only the DATA signal differes, to make the comparison easier. In test case three, the signal changed rapidly, leading to less effective compression by the algorithm, hence the increase in size. This also resulted in an increased run time and a lower improvement figure between the hardware and software implementations.

Test Case 1 Signal Uncompressed bit number Compressed bit number Reduction in Size
CLK 64 12 81.25%
SYNC 64 18 71.88%
DATA 64 42 34.88%
CLR 64 18 71.88%
Overall 64 12 64.84%
Performance Improvement With Custom Instruction Without Custom Instruction /
RLE Run time (ms) 66.05704 74.02218 /
Improvement in ms 7.96514 / /
Improvement in percentage 10.76% / /
Test Case 2 Signal Uncompressed bit number Compressed bit number Reduction in Size
CLK 64 12 81.25%
SYNC 64 18 71.88%
DATA 64 45 29.69%
CLR 64 18 71.88%
Overall 64 12 64.84%
Performance Improvement With Custom Instruction Without Custom Instruction /
RLE Run time (ms) 66.84003 74.83414 /
Improvement in ms 7.99411 / /
Improvement in percentage 10.68% / /
Test Case 3 Signal Uncompressed bit number Compressed bit number Reduction in Size
CLK 64 12 81.25%
SYNC 64 18 71.88%
DATA 64 73 -14.06%
CLR 64 18 71.88%
Overall 64 12 64.84%
Performance Improvement With Custom Instruction Without Custom Instruction /
RLE Run time (ms) 80.85817 89.21094 /
Improvement in ms 8.35277 / /
Improvement in percentage 9.36% / /