kokkos / kokkos-core-wiki

1 stars 44 forks source link

Wiki page "Hierarchical Parallelism" needs improvement #536

Open RL-S opened 2 months ago

RL-S commented 2 months ago

I think the Wiki page "Hierarchical Parallelism" is potentially very valuable to users, as it tackles some slightly more advanced, but important topics. It's also reasonably well-structured, so I'm not about to propose radical changes.

However, despite being quite familiar with the topic, this page seems much harder to understand than the previous ones - and harder than it needs to be. Here are some of the points that I stumbled over:

  1. In the Motivation, I find it a bit hard to understand the ordering of levels. Intuitively, "higher level" means larger, more abstract. Here, this seems to be the opposite.
  2. In the first one for "Basic Kernels", I was looking for any kind of hint what the variable a is supposed to be. Is it a new pattern, or even a builtin? Did I miss a declaration? Why the parentheses? I think I know the answers to these questions by now, but it took a lot of guessing.
  3. Overall, it would be really nice to just have all the variables be declared in some way, or clearly commented. It's much easier to scroll past repetitive declarations than to search further up the page what type they were, and hope that it's still the same.
  4. "Team scratch pad memory":
    • The team_shmem_size function is defined, but not used. How does it fit into the picture? How and where is it called?
    • For set_scratch_size, PerThread and PerTeam are used. I think it would help to add using Kokkos::PerTeam to make clear that this isn't pseudo code. I guess these are helper structs to distinguish constructor calls.
    • I really struggled with the last paragraph and example of this section, too. First, the sentences "Views also have a static member function which return their shared memory size requirements. The function expects the run-time dimensions as arguments, corresponding to View’s constructor." What even is the name of that function? Does that relate to the example below? I couldn't tell. And both the paragraph and the example speak of the "shared memory handle". Where is that? Is it team_member.team_scratch(0)? What's the 0 there?
  5. In Team loops, why do the inner loops take the index as an lvalue reference (int& i)? That's rather non-standard, so if it's intentional, it would be nice if it got a mention/explanation somewhere.
  6. In Restricting execution to a single executor, I get pretty lost after the sentence "Here is an example of using the broadcast capabilities to determine the start offset for a team in a buffer". I do not mind using ellipses (...) in examples. It could even enhance other subsections. But using one in the condition (if (...)) implies to me that it should be either obvious or arbitrary, and in either case I don't know how to square that with the description of the example. For everything beyond this point ('To further illustrate the “parallel region” semantics ...'), I think it can be made clearer what is being shown and how it relates to single.

As an aside: Scratch memory seems to be the first instance where allocations need the (in my opinion horrible) C syntax using sizeof. If users forget sizeof, it's a runtime error. By using template functions, it could both be more elegant and less error prone: e.g. as a free function get_shmem<double>(team_member, size) or with deduction through an rvalue parameter, to avoid the team_shmem.template get_shmem<double>(...) syntax: team_shmem.get_shmem(double{}, size). That way, not providing a type becomes a compile-time error and is thus worlds easier to debug.

ajpowelsnl commented 1 month ago

@RL-S - your observations on how the page can be improved sound good. Would you like to take a crack at revising? Ideally, we'd like to have working code for each entry, and an economy of words and ideas.

RL-S commented 1 month ago

It really feels like I don't know enough about the internals to answer the questions that I have phrased above. That's why I made that list rather than struggling my way to a potentially wrong or misleading revision.