Closed keryell closed 3 years ago
Now we have the executor for either parallel_for
or single_task
we can declare the random generator there to have 1 instance per work-group.
This is trivial for single_task
(1 work-item, so 1 work-group :-) ).
For parallel_for
, either
parallel_for
as now but then pass a local_accessor
of 1 random generator structure (containing the random generator and the std::uniform_real_distribution<float>
) and think about calling explicitly the constructor, since local_accessor
provide raw uninitialized memory.Thank you for the experiment in https://github.com/triSYCL/path_tracer/pull/37
Very useful!
This global rng
state is a fundamental nightmare and I was wrong thinking that just having one local rng
to the work-group is enough to solve the inter-work-group interference.
This is then the same problem at the work-item level inside each work-group... :-(
We could extend SYCL to introduce the concept of thread local id... :-) But this is painful and it would probably necessary to pass it as a parameter anyway.
So my id about hierarchical parallelism to have a local rng
to the work group does not work so there is no need to use it, Just a parallel_for
is enough,
For FPGA it would be interesting to see how HLS can pipeline the code, probably having the rng
executed in dataflow?
Implemented with https://github.com/triSYCL/path_tracer/pull/37
While https://github.com/triSYCL/path_tracer/pull/21 added a faster random generator for accelerators, it cannot yet work on a real device because it is a global variable not accessible from a SYCL kernel on a plain device. More generally, any global variable should be removed.