1653 saw significant slowdowns in FPGA emulation over mainline. These are due to high work-group sizes (65536 on my test system). This limits the work-group size to 2048, which has shown to provide a good balance between throughput and required synchronization in group reductions.
1653 saw significant slowdowns in FPGA emulation over mainline. These are due to high work-group sizes (65536 on my test system). This limits the work-group size to 2048, which has shown to provide a good balance between throughput and required synchronization in group reductions.