Potential Use Cases - Githubissues

eleon commented 2 years ago

Courtesy of @adammoody.

As a concrete use case, we might have a situation like:

an application uses SCR and MPI
SCR spawns a background thread per application process for data copies from node-local storage to parallel file system
MPI spawns a background thread per application process for async progress

Ideally, those background threads would run on different cores than the main application thread to avoid contention. However, they could run on the same core as the main app thread if there are no spare cores available. The background threads could run on the same core together, since they are likely not CPU intensive.

Does the Quo Vadis interface provide a way to specify a situation like that?

We can use hints within qv_scope_create to accomodate this.

// worker_scope has been defined by the app

// Get the system cores
qv_scope_get(ctx, QV_SCOPE_SYSTEM, &sys_scope);

// Check if there are available PUs
qv_scope_nobjs_avail(ctx, sys_scope, QV_HW_OBJ_PU, &pus);

if (pus > 0)
   qv_scope_create(ctx, sys_scope, QV_HW_OBJ_PU, 1,
                   QV_SCOPE_ATTR_INCLUSIVE,
                   &sub_scope);
else
   sub_scope = worker_scope;

// Launch the thread(s)
qv_pthread_create(thread, attr, start_routine, arg,
                    subscope, ctx);

What we need to implement:

QV_SCOPE_SYSTEM
qv_pthread_create
qv_scope_nobjs_avail
Hints: Implement (and define) the hints that qv_scope_create can take. The INCLUSIVE (or shared) hint means that other workers may be running on the same resource (the opposite of exclusive). By default we should place threads using a BFS strategy and then fill up the cores if multiple hardware threads are available.

I guess both SCR and MPI would need to make QV calls?

Yes, the more components that use QV, the better placement and coordination.

GuillaumeMercier commented 2 years ago

What we need to implement:

qv_pthread_create

I'm working on it.

GuillaumeMercier commented 2 years ago

* MPI spawns a background thread per application process for async progress

Interesting. It means then that we have to support the hybrid MPI + OpenMP + Pthreads model. I'm not sure that the current OpenMP-based implementation in QV supports this.

samuelkgutierrez commented 2 years ago

* MPI spawns a background thread per application process for async progress
Interesting. It means then that we have to support the hybrid MPI + OpenMP + Pthreads model. I'm not sure that the current OpenMP-based implementation in QV supports this.

Fine-grained MPI + OpenMP + Pthreads support would be fantastic. I'm aware of other use cases that would benefit from this capability, too.

GuillaumeMercier commented 2 years ago

One thing I'm thinking about it the ability to share a contex and/or a scope between OpenMP threads and Pthreads. This is more or less the same issue as in #35 I guess.

GuillaumeMercier commented 2 years ago

I think we need functions in the like of qv_init/qv_finalize to support this.

samuelkgutierrez commented 2 years ago

I think we need functions in the like of qv_init/qv_finalize to support this.

Could you please elaborate on this?

samuelkgutierrez commented 2 years ago

Also, something to consider: would splitting up OpenMP and Pthread support help in any way?

GuillaumeMercier commented 2 years ago

Could you please elaborate on this?

Yes, I can: as we already discussed, MPI and OpenMP feature some kind of runtime system that can be relied upon and queried to. This is not the case with pthread and I will need to introduce some shared memory space that can contain some information the Pthread implementation will need. In the case of multi-paradigms programs (eg. MPI + OpenMP + Pthreads) then this shared space should be accessible by all "paradigms". My thinking is that having three separate supports/implemenations that are not aware of the others is not going to work.

GuillaumeMercier commented 2 years ago

Thus, having a generic qv_init call would allow for setting up this shared space regardless of the programming model. But maybe it's a call that would need to be made only in the case of hybrid cases?

samuelkgutierrez commented 2 years ago

Could you please elaborate on this?

Yes, I can: as we already discussed, MPI and OpenMP feature some kind of runtime system that can be relied upon and queried to. This is not the case with pthread and I will need to introduce some shared memory space that can contain some information the Pthread implementation will need. In the case of multi-paradigms programs (eg. MPI + OpenMP + Pthreads) then this shared space should be accessible by all "paradigms". My thinking is that having three separate supports/implemenations that are not aware of the others is not going to work.

In that case, could one implement an internal abstraction that provides such mechanisms for Pthreads?

GuillaumeMercier commented 2 years ago

Yes, that is what I'm planning to do. But this internal abstraction would have to be shared eventually, wouldn't it?

samuelkgutierrez commented 2 years ago

Thus, having a generic qv_init call would allow for setting up this shared space regardless of the programming model. But maybe it's a call that would need to be made only in the case of hybrid cases?

Could we accomplish the same goal by implementing the missing machinery internally to QV?

GuillaumeMercier commented 2 years ago

Maybe. But how do you detect hybrid cases? And enable the support in these cases?

samuelkgutierrez commented 2 years ago

Yes, that is what I'm planning to do. But this internal abstraction would have to be shared eventually, wouldn't it?

Shared across tasks, yes; but I'm not convinced that we have to expose those details to the user.

GuillaumeMercier commented 2 years ago

And MPI + OpenMP is different from OpenMP + Pthreads IMHO.

GuillaumeMercier commented 2 years ago

Shared across tasks, yes; but I'm not convinced that we have to expose those details to the user.

I agree but I'm not sure that we can do something completely transparent. But I advocate for transparency in this matter so I think we're in agreement here.

GuillaumeMercier commented 2 years ago

Let me come up with an initial crappy design for Pthreads that works and then we'll iterate from it.

samuelkgutierrez commented 2 years ago

Maybe. But how do you detect hybrid cases? And enable the support in these cases?

Would some machinery we come up regarding #35 do the trick? Recall that the RMI should (but currently doesn't) keep track of all the groups and their respective tasks for us. Maybe we can use the RMI as the ultimate keeper of such information. This would obviate the need for an explicit init and finalize.

GuillaumeMercier commented 2 years ago

Would some machinery we come up regarding #35 do the trick?

My gut feeling is that it will (partially at least).

Recall that the RMI should (but currently doesn't) keep track of all the groups and their respective tasks for us. Maybe we can > use the RMI as the ultimate keeper of such information. This would obviate the need for an explicit init and finalize.

OK, we have to discuss a bit then because it's something that I didn't completly catched previously. Which groups are you refering to? Because we know the word can be confusing. The same groups that are included in group tabs for each structure? And what I'm thinking about (when talking about shared space or runtime info sharing) would only apply to single processes. Therefore I'm not sure that we need a global, centralized instance for this.

samuelkgutierrez commented 2 years ago

Yes, let's schedule a call so we can talk this over. This is an important decision. I have some ideas about the single-process case: it should be pretty straightforward to implement (famous last words).

GuillaumeMercier commented 2 years ago

Yes, let's schedule a call so we can talk this over. This is an important decision.

Agreed.

I have some ideas about the single-process case: it should be pretty straightforward to implement (famous last words).

Are you trying to impersonate me?

samuelkgutierrez commented 1 year ago

Here is another potential use case that's worth considering: internal use in mpibind. This might help demonstrate QV's generality in another piece of system software.

eleon commented 1 year ago

Here is another potential use case that's worth considering: internal use in mpibind

Greetings, @samuelkgutierrez Not sure I follow. Could you elaborate a bit more?

samuelkgutierrez commented 1 year ago

I was just thinking that maybe we can implement core pieces of mpibind's API using QV underneath the covers. This could serve as another demonstration of QV's generality in the system software space if we can successfully use it for common mpibind tasks.

eleon commented 1 year ago

It makes sense, @samuelkgutierrez , thanks for clarifying! Actually, we are already heading in that direction. For example, the split_at function with maximum spread is one mpibind's mappings!

hpc / quo-vadis

Potential Use Cases #45

Does the Quo Vadis interface provide a way to specify a situation like that?

I guess both SCR and MPI would need to make QV calls?