tsnlab / connx

C implementation of Open Neural Network Exchange Runtime
GNU General Public License v3.0
26 stars 8 forks source link

Optimise binary operators #56

Closed tribela closed 2 years ago

tribela commented 2 years ago

Check if needed to use broadcasted offset, If not, use faster way

Comparison between this pr (Blue is before, Red is after this PR) image

This pr also moves common function (that would duplicated on gen.sh) from __Binary.jinja.c to tensor.c This reduces compiled binary size about 10kB (583kB → 574kB)