Closed stephen-hqxu closed 3 years ago
After the attemp of increasing erosion iteration, there is no time changes, even when iteration reaches 1M. So we can confirm that it's not caused by data races.
Even though the debug information has been turned off, Nsight debugger has pointed out that launch resource has exceeded, which is super easy to fix. Though I am not sure why it only crashes under release mode.
Further investigation will be perform to check why is the case.
The actual cause is, when compiler optimisation turns on, register usage might change, causing register overflow and hence we need to decrease block size accordingly.
How bug behaves
Hydraulic ersion doesn't seem to perform at all when the configuration is selected to Release.
Debugging checks
Due to the fact that compiler optimisation has been turned on duing release mode, and debug information is off, I simply print out the result in kernel and display it to the console. However no significant defects in global memory were found. All parameters for erosion are available.
Assumption
There might be some data racing during the erosion phase, though I am not sure race condition only appears in release mode (perhaps the program runs faster and gives higher chance to occur).
Writing to heightmap seems to be susceptible to data races, since there may be more than one raindrop eroding the same area at the same time, for which I have already noticed at development time. I didn't deal with this case since synchornisation trash the performance and the chance of happening is extremely low.
Attempts
It shown that the chunk generation was performed extremely fast under release mode, acting as if there is no erosion being performed. The erosion iteration used was 81k, with 512x512x9 heightmaps.
It will be a wise choice to incease the iteration and map size and try to pickup any change in runtime. If runtime changes depended on iteration, there might data races; otherwise, I don't know.
Additionally some approaches to synchornise write operation to heightmap has been implemented and tested, including atomics, disabling L1 cache, memory fences. Unfortunately none of them takes effect, and potentially ruling out race condition.