The scalar decomposition and bucket aggregation stages are not optimized, leading to inefficient parallel processing and increased computation time in the MSM process.
Details
Adopt the signed bucket indices technique to optimize scalar decomposition and bucket aggregation. This method reduces the number of buckets by half, enhancing parallel processing efficiency and balancing the workload across Metal's GPU threads.
Acceptance criteria
Modify the scalar decomposition shader to convert scalars into signed index form within Metal.
Adjust the bucket aggregation logic to handle signed indices, including point negation based on index signs.
Ensure compatibility with Metal's buffer handling and memory access patterns.
Validate the correctness of the implementation through comprehensive testing with various scalar distributions.
Problem
The scalar decomposition and bucket aggregation stages are not optimized, leading to inefficient parallel processing and increased computation time in the MSM process.
Details
Adopt the signed bucket indices technique to optimize scalar decomposition and bucket aggregation. This method reduces the number of buckets by half, enhancing parallel processing efficiency and balancing the workload across Metal's GPU threads.
Acceptance criteria
Reference