This PR introduces fast vector operations for pillar scatter module. The for loop in exisiting module make model forward very slow. Especially with larger batch sizes.
The code has been tested to check for equal outputs before and after the changes and included in this PR
latency experiments:
forward times as a function of batch_size
Overall training time before and after the change
Before -> Average per iteration training time ~2.9sec
After -> Average per iteration training time ~1.9sec
The training time per iteration reduces by 35% with this PR (for my set of parameters / dataset)
This PR introduces fast vector operations for pillar scatter module. The for loop in exisiting module make model forward very slow. Especially with larger batch sizes.
The code has been tested to check for equal outputs before and after the changes and included in this PR
latency experiments:
Before -> Average per iteration training time ~2.9sec
After -> Average per iteration training time ~1.9sec
The training time per iteration reduces by 35% with this PR (for my set of parameters / dataset)