ocaml-multicore / domainslib

Parallel Programming over Domains
ISC License
173 stars 30 forks source link

Multi_channel: allow more than one instance per program with different configurations #50

Closed edwintorok closed 2 years ago

edwintorok commented 2 years ago

Multi_channel had shared global state in the form of dls_new_key which caused ids assigned by one multichannel to be used by another (possibly smaller channel), resulting in out-of-bounds array indexing. One possible way to fix this is to remove the global key, and use a per-channel key.

The key here also has some mutexes and conditional variables that you probably don't want shared across different multi-channels.

Draft PR, because there must've been a reason why this was a shared global variable to begin with.

Appears to fix https://github.com/ocaml-multicore/domainslib/issues/43 on my machine

kayceesrk commented 2 years ago

@ctk21 is the right person to look at this one.

edwintorok commented 2 years ago

On 11 October 2021 11:44:19 BST, Tom Kelly @.> wrote: @. commented on this pull request.

There was no special reason for the channel domain state to be global. Indeed it is reasonable and an improvement to allow multiple pools!

The only concern that comes to mind is if we need to be stronger about preventing tasks being accidentally used between two pools. For example Domain A1 is in pool A but does an await on a task executing in pool B, that could be a source of bugs in the absence of an argument for why it works.

@@ -76,13 +78,16 @@ let init_domain_state mchan dls_state = [@@inline never]

let get_local_state mchan =

  • let dls_state = Domain.DLS.get dls_key in
  • if dls_state.id >= 0 then dls_state
  • let dls_state = Domain.DLS.get mchan.dls_key in
  • if dls_state.id >= 0 then begin
  • assert (dls_state.id < Array.length mchan.channels);

The assert is going to execute every time which is sad, but I guess it can stay if you are strongly in favour.

Hi,

I thought about making the pool part of the promise, but I'm worried with multiple pools you can construct a deadlock scenario if you end up doing a Task.await while inside the async of another pool. Needs.more thought on how to handle this situation (or at least detect the deadlock, like the Mutex module sometimes can).

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

edwintorok commented 2 years ago

https://github.com/ocaml-multicore/domainslib/pull/51 I'll rebase my PR once this is in, and think about what needs to be done to run multiple pools safely.

kayceesrk commented 2 years ago

@Sudha247 Perhaps this is relevant to the Lwt_domain work.

kayceesrk commented 2 years ago

Hi @edwintorok, Jan has created a PR that fixes the conflicts to your branch https://github.com/ocaml-multicore/domainslib/issues/58#issuecomment-998251698. Would you prefer if Jan sends a PR to your unique-key branch, which you can merge, after which this PR can be merged?

edwintorok commented 2 years ago

Thanks a lot @jmid, I've updated this PR to include your changes. I've also fixed the test/backtrace.ml unit test which was failing locally (unrelated to this PR, but I like all unit tests to pass).

kayceesrk commented 2 years ago

LGTM. Merging.