Check if needed to use broadcasted offset, If not, use faster way
Comparison between this pr (Blue is before, Red is after this PR)
This pr also moves common function (that would duplicated on gen.sh) from __Binary.jinja.c to tensor.c
This reduces compiled binary size about 10kB (583kB → 574kB)
Check if needed to use broadcasted offset, If not, use faster way
Comparison between this pr (Blue is before, Red is after this PR)
This pr also moves common function (that would duplicated on gen.sh) from __Binary.jinja.c to tensor.c This reduces compiled binary size about 10kB (583kB → 574kB)