Closed xooxit closed 10 months ago
Hey there! I have not seen this, but I also haven't tried running the kernel in Vitis 2021.2, because there's a serious performance issue in the memory reader code that suddenly popped up, which I haven't figured out how to fix.
Can you check if your kernel had this elevated II because of aSplit
, in that it's related?
Does it pass in simulation? You can run a really small matrix so it doesn't take too long.
Did you try running any other configurations? Did they succeed/fail?
Unfortunately I don't have much time to maintain this these days, since I'm no longer affiliated with the university, so I would appreciate as much help as you can give me to figure out what the issue could be :-)
Hi - good luck wherever u go!
By the way, I checked the asplit
in in memory.cpp
, every related II is set to 1 but in the v++_MatrixMultiplicationKernel_hw.log
file, there is a similar issue with https://github.com/spcl/gemm_hls/issues/25.
===>The following messages were generated while performing high-level synthesis for kernel: MatrixMultiplicationKernel Log file: /home/lab/yong/SoonToBeRemoved/gemm_hls/build-wDSP/_x/MatrixMultiplicationKernel_hw/MatrixMultiplicationKernel/vitis_hls.log :
INFO: [v++ 204-61] Pipelining loop 'ReadA_N0_ReadA_K0_ReadA_N1_ReadA_N2'.
INFO: [v++ 200-1470] Pipelining result : Target II = 1, Final II = 16, Depth = 93, loop 'ReadA_N0_ReadA_K0_ReadA_N1_ReadA_N2'
The entire log files are below.
I'm gonna rebuild the kernel in hardware mode with the same configuration in Github on VITIS 2020.2
( cmake ../ -DMM_DATA_TYPE=float -DMM_PARALLELISM_N=32 -DMM_PARALLELISM_M=8 -DMM_MEMORY_TILE_SIZE_N=512 -DMM_MEMORY_TILE_SIZE_M=512
)
And I built the kernel in hardware mode, not simulation mode.
I ran various n,m,k combinations from n=m=k=16 to n=m=k=2048, and the number of mismatch results was getting larger.
Something strange is that when the input matrix configuration is the n=m=k=16, repeated executions of the command (./RunHardware.exe 1024 1024 1024 hw
) make diff
inRunHardware.cpp
of mismatch results (std::abs(testVal - refVal)
) bigger.
Hi, I'm also using gemm_hls project to build my own work. My simulation result based on the gemm is correct.
The simulation and hardware mode do the same thing, so if the mismatch exists in hardware mode, it may also have mismatch in simulation mode. And what kind of data type you use? Is that floating point?
@xooxit Did compiling it in 2020.2 make a difference? I'm curious if the II=16 issue is related to the verification error.
@definelicht I did compiling again in 2021.1 (before one is 2021.2), it has no verification error and there is also no II=16 issue.
In the above verification error issue, I did build with -DMM_ADD_RESOURCE=FAddSub_nodsp -DMM_MULT_RESOURCE=FMul_nodsp
. When I built without nodsp option in the same 2021.2 version, it has no verification issue but there is II=16 issue.
(I edited the corresponding build commands at the top question.)
@charliechou1001 Hi -
I did build with nodsp option, and there were verification issues, but without nodsp option, there was no any verification error. There was no verification error in the simulation mode above in both cases, and the data type was a floating point.
Wow, ok. So 2021.2 is slow because of II=16
, and nodsp
breaks 2021.2 correctness. Does nodsp
also break 2021.1 correctness?
I would not recommend using FMul_nodsp
, this is very expensive. FMul_fulldsp
and FAddSub_nodsp
is usually a good combo, since addition doesn't benefit much from DSPs, but multiply benefits a lot.
@definelicht I see-. nodsp
option in 2021.1 does not break correctness and no II=16
.
There is only a verification error on 2021.2 with nodsp
in both ADD and MULT.
I'm now building with -MM_ADD_RESOURCE=FAddSub_nodsp
both in 2021.2
and 2021.1
. BTW could u let me know which aspect of using FMul_nodsp
is expensive?
Ok, that's very strange. I suspect this is a bug on Xilinx' side, not in this repo. I think I will put a notice in the README that the accelerator is broken in 2021.2, and see if it improves in future versions, unless any new information comes up?
Hi @xooxit , I made the project worked on Alveo U250. The Vitis I use is 2020.2, and the the parameter I use in CMakeList is the default one, and I also tried doubled the memory tile size m/n to 512, both works for me. Maybe the problem lies in the tool edition.
Here is the screenshot of my result:
And from my workmate's experience, different edition of HLS will lead to different synthesis result with the same code, especially the hardware resource consumption, maybe the timing or other factors, such as the mismatch problems, is related to that.
And from my workmate's experience, different edition of HLS will lead to different synthesis result with the same code, especially the hardware resource consumption, maybe the timing or other factors, such as the mismatch problems, is related to that.
There is always a difference between different versions of the tools, but it's unfortunate if they even break the code :-(
Hi all, I think I am facing the same issue.
The board is U50 and the VITIS version is 2021.2.
Here is the execution log for your reference. Hope it may help to target the bug.
yclo
Hi all, I think I am facing the same issue.
The board is U50 and the VITIS version is 2021.2.
Here is the execution log for your reference. Hope it may help to target the bug.
yclo
Did you try if it works when compiled with 2021.1 or older?
Yes, I try on 2021.1 and pass the test!
yclo
Hi I reproduce the project following under command lines
Then run like this
and get mismatch result like this
I also tried to adjust the threshold of determining the mismatch result to be larger (i.e. from 1e-03 to 1e-02) and printed out all mismatched results.
My vitis version is 2021.2, xrt version is 2.12.427 and platform is xilinx_u250_gen3x16_xdma_3_1_202020_1
Btw, I learned a lot from it. Thanks for the nice work.