hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.53k stars 313 forks source link

Use of std::shared_mutex with shared_lock instead of native locks defined in ThreadGranularity.h #254

Closed sierret closed 1 year ago

sierret commented 1 year ago

I am running multi-threaded where some threads only read and some write. Thus it would be more suitable to use unique_locks and shared_locks with shared_mutexes for efficiency. From what I understand , the locks built in(in particular the Spinlock) don't have the advantages of shared_locks/unique_locks with shared_mutexes for example and lock the entire dataframe for every operation.

Are there any advantages to using your built-in locks or is it better (for me) to switch to to using shared_mutex with unique_lock/shared_lock? Would there also be any compatibility issues?

hosseinmoein commented 1 year ago

First, read the multithreading section in the documentation. There are two levels of protection you need.

  1. You need to protect the DataFrame static and internal data structures that are hidden. For that, you must use my spin lock and call set_lock()
  2. You also have to protect the DataFrame columns that are exposed to the user, if a single instance of DataFrame is accessed from multiple threads. For that you can use your own separate lock (or my spin lock). It is up to you.

In general, multithreading is not as straightforward as it may sound. It requires very careful analysis and especially repeated benchmarking and tweaking. Often it is counterproductive. It also differs from one platform to another, because it depends on a particular hardware. If you are new to multithreaded programming, I suggest you start with a single thread and maybe use the async interfaces. Once it works correctly, you can convert it to multithreaded version and compare results and efficiency and repeatedly tweak it

sierret commented 1 year ago

I'm sorry. I did read the section and understand the gist of it but can't say I get every single detail. As you say multithreading is indeed pretty complex. Which is exactly why I decided to make post an issue. I suppose what I was basically asking was if the shared/regular mutexes were capable of protecting the static and internal data structures but didn't know how to phrase it. But thank you you answered my question.