root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.63k stars 1.26k forks source link

[ntuple] Improve field token usage for parallel writing #16236

Open hahnjo opened 1 month ago

hahnjo commented 1 month ago

A REntry::RFieldToken is currently bound to a specific model id. This prevents the following usage from working with parallel writing, as I wanted to recommend it to @siliataider:

#include <ROOT/RNTupleModel.hxx>
#include <ROOT/RNTupleParallelWriter.hxx>

using namespace ROOT::Experimental;

void parallel_writer() {
  auto model = RNTupleModel::CreateBare();
  model->MakeField<float>("e");
  model->Freeze();

  auto token = model->GetToken("e");

  auto writer = RNTupleParallelWriter::Recreate(std::move(model), "ntpl", "ntpl.root");

  // per thread
  auto context = writer->CreateFillContext();
  auto entry = context->CreateEntry();

  float e;
  entry->BindRawPtr(token, &e);
}
terminate called after throwing an instance of 'ROOT::Experimental::RException'
  what():  invalid token for this entry, make sure to use a token from the same model as this entry.
At:
  void ROOT::Experimental::REntry::EnsureMatchingModel(RFieldToken) const

This is because the RNTupleParallelWriter internally clones the model, so every thread has to get tokens from its entry:

  // per thread
  auto context = writer->CreateFillContext();
  auto entry = context->CreateEntry();
  auto token = entry->GetToken("e");

In principle, we could allow using a single set of tokens for all entries from any cloned model. This requires either not changing the model id when cloning, or having a second model id with that semantic.

hahnjo commented 4 weeks ago

Thinking about this a bit more, we should be able to get the wanted behavior by modifying RNTupleModel::Clone to copy the fModelId in case the model is frozen. In case the user again Unfreezes the cloned model, it will get a new model id that point.

edit: It's not as easy because the model id is also used for REntrys, which must not be mixed even among cloned models...