nnstreamer / nnstreamer

:twisted_rightwards_arrows: Neural Network (NN) Streamer, Stream Processing Paradigm for Neural Network Apps/Devices.
https://nnstreamer.ai
GNU Lesser General Public License v2.1
707 stars 173 forks source link

Question about tensor transform arithmetic per-channel orc operation #4051

Closed byungs2 closed 1 year ago

byungs2 commented 1 year ago

HI. Recently I saw that tensor transform arithmetic per channel mode doesn't support simd operations. So my question is

  1. why tensor transform arithmetic doesn't support per-channel orc operation, and
  2. If I fix that myself as below, then what problems do you expect to arise?

Addintionally, when I profiled performance of orc operation by channels, results are 3:256:256:1 tensor takes 360 micro seconds, 256:256:3:1 tensor takes 56 micro seconds, 3:256:256:1 without orc op takes 10ms

#ifdef HAVE_ORC
  if (orc_supported (filter, in_info->type, out_info->type)) {
    walk = filter->operators;
    /**
     * Typecast should be called at the first.
     * Do the typecast. If in/out type is same, this will copy the input array to output.
     */
    orc_typecast (inptr, outptr, num, in_info->type, out_info->type);

    if (!filter->data_arithmetic.per_channel_arith) {
      while (walk) {
        op_s = (tensor_transform_operator_s *) walk->data;

        if (op_s->op != GTT_OP_TYPECAST) {
          gst_tensor_data_typecast (&op_s->value, out_info->type);
          orc_operator (outptr, num, &op_s->value, op_s->op);
        }

        walk = g_slist_next (walk);
      }
    } else {
      size_t typesize = 0;
      uint8_t *tmp_outptr = NULL;

      guint ch_dim = filter->data_arithmetic.ch_dim;
      gsize batch_offset, ch_size = 1;
      for (i = 0; i < ch_dim; ++i) {
        ch_size *= in_info->dimension[i];
      }
      batch_offset = ch_size * in_info->dimension[ch_dim];
      orc_typesize (typesize, out_info->type);

      for (i = 0; i < num / batch_offset; ++i) {
        while (walk) {
          op_s = (tensor_transform_operator_s *) walk->data;

          if (op_s->op != GTT_OP_TYPECAST) {
            tmp_outptr = outptr + (ch_size * op_s->applying_ch + batch_offset * i) * typesize;
            gst_tensor_data_typecast (&op_s->value, out_info->type);
            orc_operator (tmp_outptr, ch_size, &op_s->value, op_s->op);
          }

          walk = g_slist_next (walk);
        }
      }
    }
    return GST_FLOW_OK;
  }
#endif

Thanks for reading.

taos-ci commented 1 year ago

:octocat: cibot: Thank you for posting issue #4051. The person in charge will reply soon.

jaeyun-jung commented 1 year ago

Please push new PR about this, support per-channel op w/ORC. nnstreamer wil run the testcase to check per-channel option, below is testcase: https://github.com/nnstreamer/nnstreamer/blob/ab95e298f3b387e07b8f36c3ccfafe71d2d2981a/tests/nnstreamer_plugins/unittest_plugins.cc#L1869)

byungs2 commented 1 year ago

Oh I didn't think to use built in testcases.. I'm sorry for inconvenience. After check testcases, then I will try to make PR about it.