nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.39k stars 28.99k forks source link

`napi_create_external_buffer` is super slow #53804

Open ronag opened 1 month ago

ronag commented 1 month ago

Given the following example:


NAPI_METHOD(db_get_many) {
  NAPI_ARGV(2);

  Database* database;
  NAPI_STATUS_THROWS(napi_get_value_external(env, argv[0], reinterpret_cast<void**>(&database)));

  // TODO (fix): Ensure lifetime of buffer?
  std::vector<rocksdb::Slice> keys;
  {
    uint32_t length;
    NAPI_STATUS_THROWS(napi_get_array_length(env, argv[1], &length));

    keys.reserve(length);
    for (uint32_t n = 0; n < length; n++) {
      napi_value element;
      char* buf;
      size_t length;
      NAPI_STATUS_THROWS(napi_get_element(env, argv[1], n, &element));
      NAPI_STATUS_THROWS(napi_get_buffer_info(env, element, reinterpret_cast<void**>(&buf), &length));
      keys.emplace_back(rocksdb::Slice{buf, length});
    }
  }

  rocksdb::ColumnFamilyHandle* column = database->db->DefaultColumnFamily();
  rocksdb::ReadOptions readOptions;;

  const auto size = keys.size();

  std::vector<rocksdb::Status> statuses;
  std::vector<rocksdb::PinnableSlice> values;

  statuses.resize(size);
  values.resize(size);

  database->db->MultiGet(readOptions, column, size, keys.data(), values.data(), statuses.data());

  for (size_t idx = 0; idx < size; idx++) {
    if (statuses[idx].IsNotFound()) {
      values[idx] = rocksdb::PinnableSlice(nullptr);
    } else {
      ROCKS_STATUS_THROWS_NAPI(statuses[idx]);
    }
  }

  napi_value ret;
  NAPI_STATUS_THROWS(napi_create_array_with_length(env, values.size(), &ret));

  for (size_t idx = 0; idx < values.size(); idx++) {
    napi_value element;
    if (values[idx].GetSelf()) {
      auto ptr = new rocksdb::PinnableSlice(std::move(values[idx]));
      NAPI_STATUS_THROWS(napi_create_external_buffer(env, ptr->size(), const_cast<char*>(ptr->data()), Finalize<rocksdb::PinnableSlice>, ptr, &element));
    } else {
      NAPI_STATUS_THROWS(napi_get_undefined(env, &element));
    }
    NAPI_STATUS_THROWS(napi_set_element(env, ret, static_cast<uint32_t>(idx), element));
  }

  return ret;
}

more than 50% of the time is spent with napi_create_external_buffer which I would assume should be a rather fast operation...

ronag commented 1 month ago

@addaleax

mhdawson commented 1 month ago

@ronag this is the code for napi_create_external_buffer

napi_create_external_buffer(napi_env env,
                            size_t length,
                            void* data,
                            node_api_nogc_finalize nogc_finalize_cb,
                            void* finalize_hint,
                            napi_value* result) {
  napi_finalize finalize_cb = reinterpret_cast<napi_finalize>(nogc_finalize_cb);
  NAPI_PREAMBLE(env);
  CHECK_ARG(env, result);

#if defined(V8_ENABLE_SANDBOX)
  return napi_set_last_error(env, napi_no_external_buffers_allowed);
#endif

  v8::Isolate* isolate = env->isolate;

  // The finalizer object will delete itself after invoking the callback.
  v8impl::BufferFinalizer* finalizer =
      v8impl::BufferFinalizer::New(env, finalize_cb, nullptr, finalize_hint);

  v8::MaybeLocal<v8::Object> maybe =
      node::Buffer::New(isolate,
                        static_cast<char*>(data),
                        length,
                        v8impl::BufferFinalizer::FinalizeBufferCallback,
                        finalizer);

  CHECK_MAYBE_EMPTY(env, maybe, napi_generic_failure);

  *result = v8impl::JsValueFromV8LocalValue(maybe.ToLocalChecked());
  return GET_RETURN_STATUS(env);
  // Tell coverity that 'finalizer' should not be freed when we return
  // as it will be deleted when the buffer to which it is associated
  // is finalized.
  // coverity[leaked_storage]
}

If you built a version of Node.js commenting out

  // The finalizer object will delete itself after invoking the callback.
  v8impl::BufferFinalizer* finalizer =
      v8impl::BufferFinalizer::New(env, finalize_cb, nullptr, finalize_hint);

That would introduce a memory leak but might show if the slowness is in the base Node functionality or due to node-api overhead that might be optimized.

The main thing that we were thinking it would be good to understand is if the slowness is in node::Buffer::New or the additional wrapping in node-api.

mhdawson commented 1 month ago

@vmoroz is going to play around a bit with the implementation as well to see what might be possible in terms of flatenning/removing overhead.

ronag commented 1 month ago

Removing the finalizer improves things.