google / gemmlowp

Low-precision matrix multiplication
Apache License 2.0
1.78k stars 451 forks source link

Mixing openmp with gemmlowp multithreading causes low performance #148

Open liyinhgqw opened 6 years ago

liyinhgqw commented 6 years ago

If I run a loop with multi threads using openmp, and then call gemmlowp, the performance of gemm will be affected. Any clue?

e.g.

  #pragma omp parallel for
  for (int i = 0; i < 100; ++i) {
  }

  gemmlowp::GemmContext gemm_context;
  gemm_context.set_max_num_threads(4);
  using BitDepthParams = gemmlowp::L8R8WithLhsNonzeroBitDepthParams;
  while (iters--) {
    gemmlowp::GemmWithOutputPipeline<std::uint8_t, std::int32_t,
                                     BitDepthParams>(
        &gemm_context, lhs.const_map(), rhs.const_map(), &result.map(), -128,
        -128, output_pipeline);
  }
liyinhgqw commented 6 years ago

So, we implement multi-thread using openmp.