parthenon-hpc-lab / parthenon

Parthenon AMR infrastructure
https://parthenon-hpc-lab.github.io/parthenon/
Other
109 stars 33 forks source link

Dependency on Kokkos #615

Open jdolence opened 2 years ago

jdolence commented 2 years ago

This is an old debate, but a couple different things have made me wonder again recently if we should be a bit more rigorous in our abstraction of Kokkos. Over time, we've allowed raw Kokkos calls to show up in quite a few places, which means Kokkos is now a hard dependency. Here's my current list of raw Kokkos things that show up in the code base:

The list might get longer if we included what shows up in downstream codes because there isn't a parthenon::* alternative. As a project, I'd like to know what people think of this. Is this OK? Might this become problematic down the road? At this point, I don't think it's too late to provide parthenon:: versions of these things which would allow us, for example, to use an OpenMP backend without Kokkos, or to try out the parallel execution stuff in the standard. I don't want to get into the space of recreating Kokkos within parthenon, but I also worry about us getting backed into a corner.

I'm imagining something where we have a kokkos_abstractions.hpp that would have things like (for example)

template <typename Scalar, typename Space>
using Sum = Kokkos::Sum<Scalar, Space>;

but perhaps also a openmp_abstractions.hpp where one could define a Kokkos-free version of these things. Then a runtime_abstractions.hpp where one or the other is included at compile time.

So, anybody have any thoughts? I'll tag a few folks that I think might have an opinion. @pgrete @Yurlungur @brryan @maxpkatz @nmsriram @forrestglines . Others are obviously welcome to comment as well.

brryan commented 2 years ago

I'm in favor of the compatibility layer described above that makes Kokkos optional for three reasons. Kokkos does a great job with performance portability, of course, but (1) we aren't actually in charge of that project so if our interests diverge we could be locked out of new hardware. (2) One can imagine improving performance by tailoring a backend to our specific needs, like targeting vectorization through an openmp backend. (3) It seems like Kokkos is committed to moving to C++17 which may lead to an unusable code for the LANL people on some existing systems, when we need to move parthenon to C++17 to even support new versions of Kokkos that we need for other reasons.

This isn't a trivial amount of work but I think it is at least worth moving towards this goal unless others would have their workflows seriously damaged or broken by this change. Hopefully it would be a mostly unnoticeable change for those who just want to stay with Kokkos exclusively.

maxpkatz commented 2 years ago

Perhaps it's worth separating out things Kokkos provides that are fundamental to how Parthenon works and things which are incidental. It's good to have the parallel for and view abstractions be strict about this because that will make it a lot easier to understand in the short term whether use of the parallel algorithms in the C++ standard is an interesting long term choice. But for convenience functionality like Kokkos:max() and the profiling libraries, those are straightforward to replicate if you need to, but just add unnecessary overhead if it's likely that you'll never actually drop the Kokkos requirement.

pgrete commented 2 years ago

I'm in favor of keeping things as they are for now for the following reasons.

Alright, this has been lots of text. My bottom line/key question that I'd like to be discussed/answered is: Is is there an actual need/pressing concern right now that would motivate this kind of change?

Note, that I'll be absent till Thanksgiving so I'll only get back to this discussion afterwards.

Yurlungur commented 2 years ago

My thinking on this is we're unlikely to drop Kokkos, but I think there's a desire to experiment with non-Kokkos implementations of, e.g., OpenMP or parallel algorithms in the C++ stdlib for parallel dispatch.

So I guess my question would be similar to @maxpkatz 's question. What bits can we play with safely in that context? Perhaps the only thing we need to change is parallel dispatch, which we've already abstracted?