Variable::Span<T> - Githubissues

ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program

https://adios2.readthedocs.io/en/latest/index.html

Apache License 2.0

275 stars 127 forks source link

Variable::Span<T> #1290

Closed germasch closed 2 years ago

germasch commented 5 years ago

Can you provide some context on how apps are expected to use Span<T> from #1289? I gather there have been some requests for this, but I'd be interested to learn how others use this.

It looks like a useful feature to me, which I might want to use in my projects.

The way it looks to me from the test is:

var = io.DefineVariable<double>(....)
writer = io.Open(...)

auto span = writer.Put<T>(var)
// fill span with data
// How do I say that I'm done filling the data and it's ready to be written? Isn't that what Put is for?

The other part I'm wondering about is, why std::span (well, that's C++20, but you know what I mean)? On the app side, that works for simple 1-d arrays, I guess. But at lot of apps do more complicated things.

In my case, as an example, I calculate moments from my particle data as diagnostics. So I create a 2-d or 3-d array, calculate my output data in it, then write it, then destroy it. I think the point of this is to avoid me allocating data, then handing it to ADIOS2, which will at some point copy it into its internal buffer, then I free my data. It'd be nicer if I could create my 2-d/3-d data structure directly using adios2's buffer, avoiding the need to have two temporary buffers and a copy. But std::span doesn't really help me very much, or what's the idea?

williamfgc commented 5 years ago

@germasch these are all good questions. I still have to add the documentation (I will copy some below). Your assumptions are correct. This is something requested by applications for the reasons you outlined above and the ones I put in PR #1289 . https://github.com/ornladios/ADIOS2/blob/master/testing/adios2/engine/bp/TestBPWriteReadVariableSpan.cpp has a test for a 2D arrays as well. Underlying memory for multidimensional arrays is still contiguous. In C or C++, unless using a specialized library e.g. boost, multidimensional arrays are always linearized manually, unlike numpy arrays in Python and Fortran.

From the PR #1289 : Addresses several issues for write use-cases:

Avoid buffer duplication if application is memory-bound
Reduce the number of sub-block metadata by pre-allocating larger block memory in adios2 which is exposed to the app as in #1165
Allow for non-contiguous data to populate the contiguous buffer data directly (e.g. MFEM nodes in element) without the need of temporaries

Copy and paste from my WIP docs:

In a nutshell, adios2::Variable<T>::Span is based on C++20 std::span, formerly std::array_view, which is exactly what adios2 is providing: a static/non-resizable, non-owning view into the adios2 provide memory for a variable (e.g. BP buffer for the default Engine). For the user convenience, it's meant to be handled as a typed <T> piece of contiguous memory, which matches the underlying storage of multidimensional arrays. Since adios2 is going to be C++11 library for a long time, using std::span directly is out of the question. Instead, adios2::Variable<T>::Span provides a minimal version of std::span providing compatibility with STL iterators for user convenience when handling the population of their data.

Variable<T>::Spans are generated from an introduced overload to Engine::Put. This call allocates the requested variable dimensions immediately into the buffer, just like a Put in sync mode would do, but without populating the data (buffer memory is zeroed by default). However, spans are collected, (including calculation of min/max), similarly to deferred (default) mode calls to Put: either at PerformPuts, EndStep or Close, whichever comes first in the application's code. Careful must be taken as each Put generating a Span will invalidate any pointer from previous calls to span.data() as it resizes the buffer, thus following similar rules to pointer invalidation by std::vector::resize. Nevertheless, users never have ownership access avoiding extra costs of managing memory.

germasch commented 5 years ago

https://github.com/ornladios/ADIOS2/blob/master/testing/adios2/engine/bp/TestBPWriteReadVariableSpan.cpp has a test for a 2D arrays as well. Underlying memory for multidimensional arrays is still contiguous. In C or C++, unless using a specialized library e.g. boost, multidimensional arrays are always linearized manually, unlike numpy arrays in Python and Fortran.

Sorry, I missed the 2D test before. But what I see there now basically confirms the issue that was talking about, there is only the 1-d span which can't be accessed as a 2-d array. You're just copying the contiguous memory block from the 2-d array into the span, which works, but the idea behind the span is to avoid such a copy, of course. Unfortunately, I don't really see a clear way to solve this, given the fact that there is no one C++ multi-dimensional array class, but many, while (almost?) all of them use underlying contiguous memory, the interfaces vary wildly.

Some array classes will always just allocate their own memory, so nothing can be done in that case. Some have separate "view/ref" classes that will take a pointer to memory instead of owning it, so that'll work fine with with the span thing. Others take an allocator, in the spirit of std::allocator. That was my first thought, that one shouldn't provide a span, but a (pretend) allocator. But I'm not convinced how workable, considering that deallocation is part of that interface, and it's not so clear that this can be handled cleanly. I think in most cases, a span can be made to work, so that's good.

germasch commented 5 years ago

On the other issue, I think the interface should be reworked a bit.

Let me summarize the current Put interface:

Put(var, data, Sync): asks ADIOS2 to take the data and write it (or send it around, or whatever the engine does). After the call returns, data can be overwritten or deallocated. That's just like MPI_Send() or MPI_Put().
Put(var, data, Deferred): asks ADIOS2 to take the data and start writing (if it wants to). After the call returns, data cannot be touched by the application. That's just like MPI_Isend() or MPI_Rput(). For MPI one needs to use MPI_Wait/MPI_Test to know when it's safe to touch the data again. For ADIOS2, it's PerformPuts() or equivalent.

So the semantics of ADIOS2 match MPI (and other asynchronous APIs) very well. In fact, I'd expect that engines which actually use MPI internally to communicate data will exactly do the above, ie, do MPI_Send / MPI_Isend inside the Put call, depending on Sync/Deferred. And they'd do a MPI_Waitall in PerformPuts.

However, the choice of overloading functions to implement span makes this previously clearly defined behavior much more messy:

Put overload with spans: Doesn't actually write/start writing/declare that the data is available. In fact, the data can't be available, because the app obtains the buffer only after Put returns.
PerformPuts when spans are used now includes two things: Start writing, and wait for writing to be finished.

The change in meaning is not good, and even less so because Put is very commonly used in APIs with an expected meaning, and that's the meaning that fits the two current Put variants. With spans, you say Put, when you really mean GetPutBuffer or something like that.

The other issue is even more important: Asynchronous APIs need two components: A "start" and a "wait/test for if done". Since Put now is used for another purpose, you have only one call left (the PerformPuts that includes both parts. In other words, the new API is good in that it can avoid a copy. But at the same time, it turned into a synchronous API. While that may not matter for the current implementation of BP3, it limits you from making that more asynchronous in the future, and it prevents other engines from treating transfers involving preallocated buffers (spans) in an asynchronous fashion.

williamfgc commented 5 years ago

@germasch thanks for the feedback. My take away is that we have a to do a better job explaining, in the context of adios2, two concepts (polymorphism and abstraction) that are not familiar to C++, object-oriented crowds: i) virtual function abstractions, and ii) function overloading.....and for the crowd familiar with file I/O: Put is not POSIX write.

We have to break away with decades of fixed semantics of C APIs to understand/appreciate what adios2 C++11 bindings provide, I must admit your logic will fit very well the C API as function overload is not an option.

The change in meaning is not good,

Change in meaning is the essence of virtual functions in object-oriented frameworks (polymorphism). In adios2 all Put signatures have different contracts between the app and the library, but they all work under the same abstraction: "I'm putting a variable into adios2". Not only that, all Engine functions mean something completely different (Open, BeginStep/EndStep, Put/Get, Close) and you go across engines with the premise that they all follow the same abstraction. We can also argue about compile-time polymorphism, Putting a std::complex is not exactly the same as putting a double or an unsigned int, but they all follow the same abstraction (put numbers).

and even less so because Put is very commonly used in APIs with an expected meaning,

Is Put commonly used with an expected meaning? Please provide examples, MPI_Put (one-sided RDMA) and ostream::put (put a character) mean completely different things, same as each Engine Put in adios2. The nice thing about the word Put: is a powerful and simple abstraction.

With spans, you say Put, when you really mean GetPutBuffer or something like that.

What about using the Put abstraction to say: "I am making a contract with the library, I am putting a variable and I have two options for the memory: if I request a span back I can manipulate the variable contents, otherwise I pass the memory from my app".

The Get in GetPutBuffer is implicit and redundant the moment you have a span placeholder expecting something in return. A Buffer is adios2 private business, to the app's point of view the contract is that I have a memory space that can be populated for this Variable block requested to be Put.

Again, abstractions are a powerful concept.

But at the same time, it turned into a synchronous API.

Move semantics (Put returning a Span) and asynchronous behavior are two concepts that can't be applied easily in the same function. We had internal discussions about how providing a clean API (e.g. using move semantics for returning a span in Put/Get potentially covering write and read modes) is better than chasing performance in areas with little to no gain in adios2: i) asynchronous serialization (unless you have independent buffer-per-thread strategies, but at some point you need to block/synchronize: e.g. thread-safe reallocation) and ii) the amortized cost of memory allocation at the first step. The latter are not our bottlenecks and like you said about the benefit of avoiding the copy: the cost/benefit (cost of move semantics or sync API/avoiding copy) is very much justified.

germasch commented 5 years ago

TLDR; version

That's how you one currently uses deferred Put (looks perfectly good to me).

{
    std::vector<double> data(N);

    calculate_output_data(data);

    writer.Put(var, data, Mode::Deferred); // handing off data to ADIOS

    // do other stuff to overlap data writing / comm -- but can't touch data

    writer.EndStep();
}

With current master, if one wants to avoid a copy / save memory: (no way to do asynchronous; Put call happens before data is calculated)

{
    auto data = writer.Put<double>(var); // Maybe it's just me, but isn't it
                                         // weird to put the data before
                                         // calculating it?
    calculate_output_data(data);

    // could do other stuff, but ADIOS2 can't work with my data yet, so no
    // overlap / async (how would it know it's ready?)

    writer.EndStep();
}

What I think a proper API would look like, which is much more similar to current deferred behavior, still asynchronous:

{
    auto data = writer.GetBuffer<double>(var); // Maybe "GetPutBuffer" would be better?

    calculate_output_data(data);

    writer.Put(var); // handing the data to ADIOS2

    // do other stuff to overlap data writing / comm -- but can't touch data

    writer.EndStep();
}

And here's the long version

@germasch thanks for the feedback. My take away is that we have a to do a better job explaining, in the context of adios2, two concepts (polymorphism and abstraction) that are not familiar to C++, object-oriented crowds: i) virtual function abstractions, and ii) function overloading.....and for the crowd familiar with file I/O: Put is not Write.

@williamfgc I take this to imply that you think that I'm not familiar with C++ or object oriented programming. I don't know where you get this impression from. (Needless to say, I don't think it's accurate). I think these discussions could be much more beneficial if you would consider the possibility that other people actually might have valid points.

Also, while I don't see how that's related to the point, what do you mean by "Put is not Write"? I already said in the above that Put may mean write or send or something like that depending on context. Is there a fundamental difference I missed between fwrite vs fputs? (that is (C) I/O, isn't it)?

We have to break away with decades of fixed semantics of C APIs to understand/appreciate what adios2 C++11 bindings provide, I must admit your logic will fit very well the C API as function overload is not an option.

True, C doesn't have function overloads. so in C you would write

engine_put_sync(engine, var, data). Will take data and write (put?) it to the file
engine_put_deferred(engine, var, data). Same, but asynchronously. Need to wait for completion before touching data

I don't think in C anyone would add a third "overload"

buf = engine_put_buffer(engine, var) which does what your new overload does, ie, return a buffer and not put anything. buf = engine_get_buffer(...) would be a much better choice, since that's what the function does.

What overloads are good for, fundamentally, is to be able to use the name for functions that do very similar things.

In C, you have to write

void print_int(int i);
void print_double(double x);
// ...
{
  print_int(3);
  print_double(4.5);
}

In C++, overloads can make this prettier while there is no danger of getting confused:

void print(int i);
void print(double x);
// ...
{
  print(3);
  print(4.5);
}

(Of course, there are more benefits to overloads, but this is the basic idea.)

In C++, you can also do this:

std::string convert(const std::string& s) { /* transform string to uppercase */ }
double convert(double& x) { x *= 1000.; /* unit conversion */ }

While perfectly valid, I'd hope you agree that overloading convert to do two very different things is not a good idea (and that's even though both actually are conversions.)

The change in meaning is not good,

Change in meaning is the essence of virtual functions in object-oriented frameworks (polymorphism).

No it's not. Sorry for mentioning OOP 101, but you have Shape base class with an area() virtual function. You have derived classes Triangle and Circle. They implement area() to return the area of their respective shape. There is no change in meaning.

Also, my comments were about overloads of Put, not about virtual functions. Yes, the different overloads of Put are (separate) virtual functions,(complicated by the fact that you can't overload templated functions directly). These virtual functions that exist are perfectly following the OOP paradigm, that is the 'BP3' engine will take the data and write it to bp3, the 'hdf5' engine will take the data to hdf5. That's good, exactly because it is not a change in meaning.

In adios2 all Put signatures have different contracts between the app and the library, but they all work under the same abstraction: "I'm putting a variable into adios2".

I think you have to stretch the English language here. "Putting a variable into ADIOS2" to me is what DefineVariable does (maybe). What you mean for Put is "I'm putting a variable's data into ADIOS2" (so that it can write/send/whatever). But you couldn't say that, because then it would be come obvious that the new overload doesn't actually do that, right?

Not only that, all Engine functions mean something completely different (Open, BeginStep/EndStep, Put/Get, Close) and you go across engines with the premise that they all follow the same abstraction.

Right, the other Engine functions mean different things, and that's because they do different things, and are named differently. No argument there -- but I don't see what your point is. I'm not saying that you should overload the "Get" function to give you the internal buffer, if that's what you mean, I'm saying that since getting the internal buffer is a new (different) thing, it should have it's own function, rather than overloading an existing one which doesn't fit.

We can also argue about compile-time polymorphism, Putting a std::complex is not exactly the same as putting a double or an unsigned int, but they all follow the same abstraction (put numbers).

Well, we can also argue about the weather tomorrow, but what's the point? (Yes, I know what compile-time polymorphism is, in case you're wondering, but again, that's not the point. I'm not arguing about the <T> on these functions.)

and even less so because Put is very commonly used in APIs with an expected meaning,

Is Put commonly used with an expected meaning? Please provide examples, MPI_Put (one-sided RDMA) and ostream::put (put a character) mean completely different things, same as each Engine Put in adios2. The nice thing about the word Put: is a powerful and simple abstraction.

I'm sorry, but to me MPI_Put and ostream::put are using the same general meaning for put, that is, take data from the user and do their thing with it, which is communication for one, and putting it into the stream for the other. There is also gets (yeah, that's bad) / puts, or HTTP GET and PUT, etc. What all the puts have in common is that they take input from the caller -- while the get in some fashion or other return something to the caller.

With spans, you say Put, when you really mean GetPutBuffer or something like that.

What about using the Put abstraction to say: "I am making a contract with the library, I am putting a variable and I have two options for the memory: if I request a span back I can manipulate the variable contents, otherwise I pass the memory from my app".

Obviously, you can also call a function "makeCoffee" and then declare that what it does is to make a contract to give you an empty cup so you can make coffee yourself". You can define your API to do whatever you want it to be. I just don't think that helps usability.

The Get in GetPutBuffer is implicit and redundant the moment you have a span placeholder expecting something in return.

Maybe you should ask some programmers what they think this line does:

auto buf = io.Put(variable);

I don't think a lot of them will say "oh sure, he implicitly meant io.GetPutBuffer(). I think most will say: "What? Shouldn't that be buf = io.Get(variable);? (Which isn't at all what the call does).

A Buffer is adios2 private business, to the app's point of view the contract is that I have a memory space that can be populated for this Variable block requested to be Put.

I mostly agree with this sentence, except that I don't see this as an argument for naming the function Put.

Again, abstractions are a powerful concept.

True, but how's that related?

But at the same time, it turned into a synchronous API.

Move semantics (Put returning a Span) and asynchronous behavior are two concepts that can't be applied easily in the same function. We had internal discussions about how providing a clean API (e.g. using move semantics for potentially write and read modes) is better than chasing performance in areas with little to no gain in adios2: i) asynchronous serialization (unless you have independent buffer-per-thread strategies, but at some point you need to block/synchronize: e.g. thread-safe reallocation) and ii) the amortized cost of memory allocation at the first step. The latter are not our bottlenecks and like you said about the benefit of avoiding the copy: the cost/benefit (cost of move semantics or sync API/avoiding copy) is very much justified.

Move semantics in C++11 has a well-defined meaning, and that's got nothing do do with this discussion. There is no std::move, or rvalue ref in this API, is it? What's true is that move semantics are a way to move pointers around to avoid copying, but it's not a concept easily applicable to what you're trying to (and do) achieve with this PR. If you were in fact moving the data into ADIOS2 in the interface, then a proper asynchronous interface might not be needed anymore, because the user doesn't have the data anymore, so no need to notify them when they're free to touch it. But again, not the case here.

Let me (try to) be clear: The discussion about the naming of the function that hands the buffer to the user is just a usability issue, but that as such doesn't affect functionality. What however does affect functionality is that because you use the Put overload to return the buffer, you don't use it anymore after you actually filled the buffer with your data with the meaning of "The data is ready. I'm giving it to you so you can start working on it". The only way to say "The data is ready" is now EndStep or equivalent, but at that point it's too late to, e.g, use MPI_Isend() on the engine side, or any other kind of asynchronous processing.

And that's already happening today, as some quick grepping shows. InsituMPI does as I would have expected use MPI_Isend in Put(..., Deferred) and then MPI_Waitall in EndStep. Once you provide the buffer interface for InsituMPI, you will find that there is no more opportunity for communication overlap (when the buffer interface is used).

pnorbert commented 5 years ago

Kai,

So the semantics of ADIOS2 match MPI (and other asynchronous APIs) very well. In fact, I'd expect that engines which actually use MPI internally to communicate data will exactly do the above, ie, do MPI_Send / MPI_Isend inside the Put call, depending on Sync/Deferred. And they'd do a MPI_Waitall in PerformPuts.

I don't think it matches MPI. There is no way to use MPI send/recv in Put_sync, simply because there are no addressees. sync/deferred simply states one thing: I promise not to modify the content of the memory pointed by the pointer until PerformPuts/EndStep, or, I need/want to modify it (for writing another variable or block with the same temporary buffer I am creating).

There is no promise of asynchronous or synchronous behavior. Reading anything extra into the name of Put is mistaken. This function should be called something like: HeyAdiosHereIsAPointerToAVariableForThisIOStep(). Or my personal favorite: C() in the list of A(), B(), C() and D(). We had way too much heated discussions about naming these functions.

pnorbert commented 5 years ago

Deleted because I misunderstood your comment.

pnorbert commented 5 years ago

On Fri, Mar 15, 2019 at 2:10 AM Kai Germaschewski notifications@github.com wrote:

Let me (try to) be clear: The discussion about the naming of the function that hands the buffer to the user is just a usability issue, but that as such doesn't affect functionality. What however does affect functionality is that because you use the Put overload to return the buffer, you don't use it anymore after you actually filled the buffer with your data with the meaning of "The data is ready. I'm giving it to you so you can start working on it". The only way to say "The data is ready" is now EndStep or equivalent, but at that point it's too late to, e.g, use MPI_Isend() on the engine side, or any other kind of asynchronous processing.

And that's already happening today, as some quick grepping shows. InsituMPI does as I would have expected use MPI_Isend in Put(..., Deferred) and then MPI_Waitall in EndStep. Once you provide the buffer interface for InsituMPI, you will find that there is no more opportunity for communication overlap (when the buffer interface is used).

You are right. It's not entirely accidental that we postponed thinking about this new functionality for two years and are looking at it again now as the need arises. We had too much fight about the API even without assisted buffering and as you see this one new functionality again throws off the (perceived) balance of the API set.

William, I think Kai has a good point here that we need to separate more clearly the intent of the new function. It seems convenient to reuse the Put() name - since it means nothing what the English word may convey to users anyway - but it is an entirely new functionality and it deserves a new name.

Kai, however, this brings up a seemingly naive question. If "GetBuffer" is used separately, should a Put() also be used in the code, or is it optional? Shouldn't we have a separate, optional function for this "The data is ready" signal, like BufferReady() or something?

pnorbert commented 5 years ago

On Thu, Mar 14, 2019 at 10:13 PM Kai Germaschewski notifications@github.com wrote:

Sorry, I missed the 2D test before. But what I see there now basically confirms the issue that was talking about, there is only the 1-d span which can't be accessed as a 2-d array. You're just copying the contiguous memory block from the 2-d array into the span, which works, but the idea behind the span is to avoid such a copy, of course. Unfortunately, I don't really see a clear way to solve this, given the fact that there is no one C++ multi-dimensional array class, but many, while (almost?) all of them use underlying contiguous memory, the interfaces vary wildly.

Some array classes will always just allocate their own memory, so nothing can be done in that case. Some have separate "view/ref" classes that will take a pointer to memory instead of owning it, so that'll work fine with with the span thing. Others take an allocator, in the spirit of std::allocator. That was my first thought, that one shouldn't provide a span, but a (pretend) allocator. But I'm not convinced how workable, considering that deallocation is part of that interface, and it's not so clear that this can be handled cleanly. I think in most cases, a span can be made to work, so that's good.

How about providing extra convenience functions for this like the HeatTransfer C++ example does? I mean, get a pointer to the buffer from the span and then create your own N-d array:

HeatTransfer::HeatTransfer(const Settings &settings) : m_s{settings} { m_T1 = new double [m_s.ndx + 2]; m_T1[0] = new double[(m_s.ndx + 2) (m_s.ndy + 2)]; for (unsigned int i = 1; i < m_s.ndx + 2; i++) { m_T1[i] = m_T1[i - 1] + m_s.ndy + 2; } }

HeatTransfer::~HeatTransfer() { delete[] m_T1[0]; delete[] m_T1; }

In this case m_T1[0] would have the pointer from span, no need to allocate/deallocate.

williamfgc commented 5 years ago

@pnorbert you know where I am coming from. Yeah, I don't want to reopen any discussion, nor introduce more functions that fall under the same abstractions. We have too much to do.

auto buf = engine.Put(var) is 100% clear: return type and no pointer input. The span is the only Put signature in adios2 that doesn't pass a pointer and returns something I can use to populate. The misunderstanding comes from the option to populate the data (yeah, it's optional as it can be called without a return type or simply not touching the returned span), this Put already populates the data by initializing the variable block with zeros (a real improvement would be to allow the user set the initial value). I still have to tests with zero size blocks, combining all kinds of variables, etc. Saying that we are returning the buffer is misleading since that's only part of what we are doing and it is only the default engine doing that. A RDMA engine will have a remote memory space? What about shared memory? (notice how buf is the user interpretation in the above line, doesn't necessarily mean it's a buffer). Are we introducing a new function for every case when the span/Put abstraction covers all of these cases. GetPutSharedMem, GetPutHeapBuff, GetPutSomeRDMA??? I know libraries with hundreds of APIs calls everyone complains about....adios2 doesn't want to be one of those.

Not only that, but the above full signature allows for a second parameter to request a block from a specific buffer (in async, multithreaded strategies). Again, serves as a neat abstraction. auto span = Put(var, bufferID = 0);

We don't have to introduce new functions, new weird names (over time I see it was a good decision to drop PutDeferred and PutSync). Contracts are the exactly the same (Put informs a variable is to be passed to adios2, populate is completely optional and done outside adios2, and PerformPuts/EndStep/Close executes memory consumption).

A function that comes to my mind is plot in Python matplotlib...it has a tons of overloads through **kwargs....do we have to have a separate plot function for every intention? No, they are all falling under the plot abstraction. That's what makes it so widely adopted and powerful.

I agree with @pnorbert in the sense that Put in adios2 doesn't mean the data is consumed or dispatched like in MPI_Put. No, that's PerformPuts, EndStep, Close. The latter are the ones to be thought as async in an async API, that's the whole point of pushing for deferred mode....sending in groups is also better than making a request for each variable.

williamfgc commented 5 years ago

Interestingly enough ostream::put returns a handler to ostream , which is also an abstraction: http://www.cplusplus.com/reference/ostream/ostream/put/ ostream& put (char c);

williamfgc commented 5 years ago

@pnorbert we should explore your idea with the heat transfer example and T* span.data()

germasch commented 5 years ago

Kai, So the semantics of ADIOS2 match MPI (and other asynchronous APIs) very well. In fact, I'd expect that engines which actually use MPI internally to communicate data will exactly do the above, ie, do MPI_Send / MPI_Isend inside the Put call, depending on Sync/Deferred. And they'd do a MPI_Waitall in PerformPuts. I don't think it matches MPI. There is no way to use MPI send/recv in Put_sync, simply because there are no addressees

I guess I don't understand the "no addresses" part, can you elaborate? But it's also not that I meant to say that ADIOS2 has to use MPI. What I meant is that the contract about the buffer you pass to sync/deferred put is exactly the same as as for the blocking/non-blocking MPI. (note that MPI_Send also doesn't actually say anything about whether it actually has sent the data by the time it returns, it might well have just copied the data into an internal buffer to RDMA from, but the actual transfer might not have happened yet). Obviously some engines will not even use MPI at all in handling the data (except metadata, maybe).

I don't think the ADIOS2 says anything about blocking/non-blocking, so if does not all of the following may apply. The insitu MPI Writer does not call MPI_Send on Put(Sync), I think, because it can't be sure there's a receiver and a deadlock could result. But I haven't said that it does. I do think it's imaginable there to be an engine which works together with a dedicated I/O server, which sits on separate processes doing nothing but waiting for data to be received and then written to disk. In that case, one could imagine doing MPI_Send/MP_Isend for Put(Sync/Deferred, in particular in a memory-constrained situation where one wants to avoid additional copies).

sync/deferred simply states one thing: I promise not to modify the content of the memory pointed by the pointer until PerformPuts/EndStep, or, I need/want to modify it (for writing another variable or block with the same temporary buffer I am creating). There is no promise of asynchronous or synchronous behavior.

Right, and that's actually exactly what I meant. I guess I chose my words poorly by saying asynchronous/ synchronous. I don't mean to imply that the behavior has to be that way, just like MPI_Send doesn't promise to actually be synchronous, nor does MPI_Isend promise to be actually asynchronous. But there are obviously good reasons for the existence of both MPI_Isend and Put(Deferred), that is to allow the handling of the data to overlap while returning control to the application, so that's what I meant here. Those reasons for PutDeferred don't necessarily go away when a buffer has been preallocated. (For BP3, in its current implementation, they do go away. But if someone were to decide in the future to use POSIX async io or something, they might. But even today for insitumpi with the current API, all actual communication would be forced to happen within PerformPuts() which I'd call "synchronous". I guess what I really mean is that it would force for all the communication to happen within a single API call, either something like MPI_Send, or something like a bunch of MPI_Isend's and then an MPI_Waitall. But there is no way for the app to do anything useful during the Waitall since control can't be returned to it.

Reading anything extra into the name of Put is mistaken. This function should be called something like: HeyAdiosHereIsAPointerToAVariableForThisIOStep().

But that's actually my point. The meaning of Put is "here's the data for this variable for this step". The new overload of Put doesn't follow that meaning.

williamfgc commented 5 years ago

The meaning of Put is "here's the data for this variable for this step". The new overload of Put doesn't follow that meaning.

Please read my comment above, it actually does with zeroed memory (we could add a value argument to initialize), populating after that Put is purely optional, but extremely helpful to tackle real problems.

pnorbert commented 5 years ago

Kai, yes we agree on the problem. That piece of indicating the readiness of the data is missing with using put() for getting the memory from adios. So we either need a different name for this function, or one more function for the readiness.

I am going dark for a couple of days now so don’t expect more answers from me now. William is also going for a trip to do a tutorial, so let’s continue the discussion later.

On Fri, Mar 15, 2019 at 1:54 PM Kai Germaschewski notifications@github.com wrote:

Kai, So the semantics of ADIOS2 match MPI (and other asynchronous APIs) very well. In fact, I'd expect that engines which actually use MPI internally to communicate data will exactly do the above, ie, do MPI_Send / MPI_Isend inside the Put call, depending on Sync/Deferred. And they'd do a MPI_Waitall in PerformPuts. I don't think it matches MPI. There is no way to use MPI send/recv in Put_sync, simply because there are no addressees

I guess I don't understand the "no addresses" part, can you elaborate? But it's also not that I meant to say that ADIOS2 has to use MPI. What I meant is that the contract about the buffer you pass to sync/deferred put is exactly the same as as for the blocking/non-blocking MPI. (note that MPI_Send also doesn't actually say anything about whether it actually has sent the data by the time it returns, it might well have just copied the data into an internal buffer to RDMA from, but the actual transfer might not have happened yet). Obviously some engines will not even use MPI at all in handling the data (except metadata, maybe).

I don't think the ADIOS2 says anything about blocking/non-blocking, so if does not all of the following may apply. The insitu MPI Writer does not call MPI_Send on Put(Sync), I think, because it can't be sure there's a receiver and a deadlock could result. But I haven't said that it does. I do think it's imaginable there to be an engine which works together with a dedicated I/O server, which sits on separate processes doing nothing but waiting for data to be received and then written to disk. In that case, one could imagine doing MPI_Send/MP_Isend for Put(Sync/Deferred, in particular in a memory-constrained situation where one wants to avoid additional copies).

sync/deferred simply states one thing: I promise not to modify the content of the memory pointed by the pointer until PerformPuts/EndStep, or, I need/want to modify it (for writing another variable or block with the same temporary buffer I am creating). There is no promise of asynchronous or synchronous behavior.

Right, and that's actually exactly what I meant. I guess I chose my words poorly by saying asynchronous/ synchronous. I don't mean to imply that the behavior has to be that way, just like MPI_Send doesn't promise to actually be synchronous, nor does MPI_Isend promise to be actually asynchronous. But there are obviously good reasons for the existence of both MPI_Isend and Put(Deferred), that is to allow the handling of the data to overlap while returning control to the application, so that's what I meant here. Those reasons for PutDeferred don't necessarily go away when a buffer has been preallocated. (For BP3, in its current implementation, they do go away. But if someone were to decide in the future to use POSIX async io or something, they might. But even today for insitumpi with the current API, all actual communication would be forced to happen within PerformPuts() which I'd call "synchronous". I guess what I really mean is that it would force for all the communication to happen within a single API call, either something like MPI_Send, or something like a bunch of MPI_Isend's and then an MPI_Waitall. But there is no way for the app to do anything useful during the Waitall since control can't be returned to it.

Reading anything extra into the name of Put is mistaken. This function should be called something like: HeyAdiosHereIsAPointerToAVariableForThisIOStep().

But that's actually my point. The meaning of Put is "here's the data for this variable for this step". The new overload of Put doesn't follow that meaning.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/ornladios/ADIOS2/issues/1290#issuecomment-473274661, or mute the thread https://github.com/notifications/unsubscribe-auth/ADGMLcGCR0vkYXfl-GMe-U5v8ZSL23Bjks5vW5f9gaJpZM4b1PPi .

williamfgc commented 5 years ago

@germasch my comments are related to a very common thought in the communities when dealing with abstract classes: trying to concretize the meaning of an abstract/interface class. The concretization of Put (about the buffer) while is done by the default engine and others may follow, doesn't mean every engine must provide a "buffer view as a span" hence introducing "buffer" or "Get" in the function name breaks the abstraction. Saying: "I have a car" is not the same as "I have a 2018 electric blue BMW".

We are simply saying, hey we Put zeroed memory for you....here, you can optionally have the memory back as a span to populate it (not manage it nor own it) and we'll collect it at PerformPuts (which indicates readiness).

germasch commented 5 years ago

William, I think Kai has a good point here that we need to separate more clearly the intent of the new function. It seems convenient to reuse the Put() name - since it means nothing what the English word may convey to users anyway - but it is an entirely new functionality and it deserves a new name.

I'm having some trouble parsing this sentence. Do you mean "while it might seem convenient to reuse the name, it's new functionality and should have a new name"? I may not agree on the meaning of the word "Put" in the English language being all that unclear, in particular not as it come to computers. If I say "I put the box on the table" means I take something I have and put it elsewhere. If I say "I get the table from the table", it was somewhere else and I took it and now have it. If it goes together with a variable, "put" goes with (n Fortran) 'intent(in)' for the function, get goes with 'intent(out)'. To me, the meaning of "put" as HeyAdiosHereIsAPointerToAVariableForThisIOStep (okay, more closely, "HeyAdiosImgivingYouTheDataForThisIOStep", doesn't have to be a pointer, and I might want to tell you more about it) is perfectly clear, and I think I've showed many other places where get/put are used to express similar general intent. I believe the choice of "Get/Put" in the Engine API is perfect, actually.

Kai, however, this brings up a seemingly naive question. If "GetBuffer" is used separately, should a Put() also be used in the code, or is it optional? Shouldn't we have a separate, optional function for this "The data is ready" signal, like BufferReady() or something?

I very strongly believe they should be separate for two reasons:

If you look at my example above, you will see that "current deferred" and "proposed buffered (deferred)" follow the very same flow, so in fact the change to use the new functionality is very straightforward (of course, in a 5 line example, everything's pretty straightforward).
who provides the buffer vs choosing sync/deferred are fairly orthogonal issue. in some cases (current bp3), using the provided buffer means that sync/deferred end up doing the same thing. But in others (including insitumpi right now), staying with the the Put call mean that communication can happen between the Put and the PerformPuts(), overlap with computation (e.g., calculation of the next field to be written for this step).

So in my opinion, there is no reason to drop the Put call that indicates to ADIOS2 that the data is ready for it to take it, it's much more consistent with the case where ADIOS2 doesn't provide the buffer. I think there are two points that one can (and should) have a discussion about:

Should the Put call still take the pointer to the data? It's not really required, since ADIOS2 already knows it, but if passed, adios2 could check consistency.
Should the Sync/Deferred flag still be passed? I'm inclined to say yes, because it would still trigger different code paths in, e.g., insituMPI, and maybe the user has a reason to choose. But it's not a clear cut case, one could argue for not passing it.

williamfgc commented 5 years ago

Once again, Put is already expressing that. The population of data after that Put via a span is purely optional (thanks to move semantics BTW) if the user doesn't want zeros or perhaps, an initial value (which should be added).

e.Put(var);
e.EndStep();

Variable has a block with zeros

eisenhauer commented 5 years ago

I'd tend to agree with Kai both on having a name for this like "GetBuffer" and in answering the question of whether or not a Put is clearer and leaves us in a better position for non-naive implementations. (And I think the implications of a default value should be considered.) I think the GetPutBuffer() without a Put() carries implicit assumptions about the underlying implementation. Particularly that what you get is actually a "pointer" into some internal buffer, that that buffer won't be dealt with internally in any way until EndStep(), etc. At least allowing, if not requiring an explicit Put() would allow internal handling of partial buffers by letting ADIOS2 know when the data was actually ready. Maybe no engine currently implements that, but introducing semantics that are incompatible with that functionality seems undesirable.

However, we also have to think about this in the context of being able to write more than one segment of a global variable from a particular rank. Something we've had a problem with before internally (associating data with vars and now allowing multiple items), but we'd have to be careful to avoid messing that up here too. For example in Kai's example, { auto data = writer.GetBuffer(var); // Maybe "GetPutBuffer" would be better? calculate_output_data(data); writer.Put(var); // handing the data to ADIOS2 // do other stuff to overlap data writing / comm -- but can't touch data writer.EndStep(); } There's an implicit understanding that the writer.Put(var) applies to the data value earlier (most recently?) acquired from writer.GetBuffer(). var holds internal state about the selection which tells ADIOS2 about the geometry of what is being written. Generally this is captured at the Put() or PutDeferred(), so if you want to write multiple parts of a single global array from a single write, you set the selection for part A, do a Put() on the var, set the selection for part B, do a Put() on the var, etc. You couldn't really do that with GetBuffer() followed by Put(), unless you require that the GetBuffer/Put pairs for a single var don't overlap, which constrains how calculate_output_data() can work. GetBuffer() without a Put() doesn't have this problem, but it has the problem that only EndStep() tells ADIOS that the data is ready. Adding the Put() lets ADIOS know that the data is available before EndStep and avoids precluding fancier data handling than we have now, but if the only link between the GetBuffer and the Put() is the var, then we've got a problem with multiple-segment handling. The introduction of an opaque handle returned by GetBuffer (from which an actual buffer could be acquired with another function and which could be separately submitted to Put()) would have neither of these problems. But it's a more complex API.

germasch commented 5 years ago

How about providing extra convenience functions for this like the HeatTransfer C++ example does? I mean, get a pointer to the buffer from the span and then create your own N-d array

To be clear, I think all of us agree on span being a good interface to provide the buffer.

As to whether to provide a helper to make a multi-d view using the span as the underlying buffer, I think it might be useful for someone, but overall I'd expected not too much use. I suppose that everyone who's using the C++ interface for a scientific app which works with multi-dimensional arrays already has either their own class for it or is using something which is out there. If one is already using a particular approach, then I think two cases can happen: (1) The approach allows to construct the data structure given preallocated memory (think PETSc VecWithArray, though I think that's now called differently.) Then you're all good. (2) The approach doesn't support an easy way to use separately allocated memory. In that case one would either have to copy the data from one's own data structure into the buffer, or, say, pass the helper adios2 view to the function that actually calculates the data. I don't think people would do the latter because they already settled on a different multi-array class, and the former would mean the data gets copied anyway. So in that situation it's just much simpler not to use the preallocated buffer interface, since one doesn't really gain much.

eisenhauer commented 5 years ago

Oh, on the default value thing. Default values are not terrible ideas, but we shouldn't think that they're always free. Without careful understanding of the semantics, they may require an implementation where the default value must be copied in to the data area before the user is given access to the memory. If you're using this API to avoid a copy on a huge data set, having semantics that required ADIOS to fill that huge data area with zeros prior to the app writing over it with its data is going to eliminate much of the "avoid a copy" gain that this API supposedly helps enable.

williamfgc commented 5 years ago

@eisenhauer I agree with your points (and I will add the default value population BTW). I also take @chuckatkins input in which this interface is clean and flexible enough so at Read we can reuse it: introducing Get at Put will imply adding Put at Get and will make the APIs very complex and confusing as you stated. For a complex engine that has multiple memory areas (e.g. buffer):

span = e.Put<int32_t>(var, bufferID);
span[0] = 0;
span[1] = 1;
span[2] = 2;
e.EndStep();

// view into adios2 memory space (e.g. buffer, RDMA, shared memory, etc.), 
// for files it's not as helpful
const adios2::Variable<T>::Span span = e.Get(var, bufferID); 
//optionally the user can access span  = {0,1,2}
v[0] = span[0]; // 0
v[1] = span[1]; // 1
v[2] = span[2]; // 2
e.EndStep(); //done with span

williamfgc commented 5 years ago

If you're using this API to avoid a copy on a huge data set, having semantics that required ADIOS to fill that huge data area with zeros prior to the app writing over it with its data is going to eliminate much of the "avoid a copy" gain that this API supposedly helps enable.

I think we should measure that in the adios2 context (basically cost of malloc vs. calloc/memset in an I/O library )

germasch commented 5 years ago

auto buf = engine.Put(var) is 100% clear: return type and no pointer input.

So actually @pnorbert made my point here perfectly, if inadvertently. If you go back in your email or to his edited message, you will find that he himself got entirely confused by this use of Put, as currently implemented, saying the example (following the exact same flow as your test) was weird and not how this was ever intended to be used.

The span is the only Put signature in adios2 that doesn't pass a pointer and returns something I can use to populate.

Sure, it's a unique signature, otherwise the compiler would have complained. I'm not saying it's technically broken.

The misunderstanding comes from the option to populate the data (yeah, it's optional as it can be called without a return type or simply not touching the returned span), this Put already populates the data by initializing the variable block with zeros (a real improvement would be to allow the user set the initial value).

I suppose we agree that in the common use case, apps won't just write output consisting of zeros, though, right?

Saying that we are returning the buffer is misleading since that's only part of what we are doing and it is only the default engine doing that.

Obviously, you're doing something else when the function is called, I mean at the very least, allocating memory. What you're not doing is "put"ting anything to adios.

A RDMA engine will have a remote memory space? What about shared memory? (notice how buf is the user interpretation in the above line, doesn't necessarily mean it's a buffer). Are we introducing a new function for every case when the span/Put abstraction covers all of these cases.

Maybe we have different definitions of what a buffer is, a buffer in shared memory is still a buffer to me. I don't really see the point, though. If the buffer was in shared memory, you'd still need to indicate that you're done filling it. You can do that in PerformPuts, but that again eliminates the possibility to overlap anything. Which you may not need in that case, though I can come up with an example where you would benefit.

GetPutSharedMem, GetPutHeapBuff, GetPutSomeRDMA??? I know libraries with hundreds of APIs calls everyone complains about....adios2 doesn't want to be one of those.

GetBuffer (or GetPutBuffer) is exactly one function. Depending on the engine, it would do the right thing. You're not proposing PutSharedMem, PutHeapBuffer, or anything like that either, so why make it up when it's called GetBuffer?

Not only that, but the above full signature allows for a second parameter to request a block from a specific buffer (in async, multithreaded strategies). Again, serves as a neat abstraction. auto span = Put(var, bufferID = 0);

I can't comment on how useful this is, but there's nothing that prevents you from having GetBuffer(var, bufferID = 0).

We don't have to introduce new functions, new weird names (over time I see it was a good decision to drop PutDeferred and PutSync). Contracts are the exactly the same (Put informs a variable is to be passed to adios2, populate is completely optional and done outside adios2, and PerformPuts/EndStep/Close executes memory consumption).

I haven't have no problems with PutDeferred/Sync vs one Put with the Mode enum. You will find people having preferences for one or the other, but that's something reasonable people can agree to disagree on.

A function that comes to my mind is plot in Python matplotlib...it has a tons of overloads through **kwargs....do we have to have a separate plot function for every intention? No, they are all falling under the plot abstraction. That's what makes it so widely adopted and powerful.

I haven't found an overload of plot that returns a data buffer to me to be filled afterwards. Have you?

I agree with @pnorbert in the sense that Put in adios2 doesn't mean the data is consumed or dispatched like in MPI_Put.

So Put(..., Sync) doesn't consume the data? I think I know what people mean when they say "data is consumed", but you appear to be using it differently. I don't know what you mean by "dispatching data".

No, that's PerformPuts, EndStep, Close. The latter are the ones to be thought as async in an async API, that's the whole point of pushing for deferred mode....sending in groups is also better than making a request for each variable.

So as we can see in preceding discussion, we may have some trouble defining async. But if we go with "allowing for overlap" while data is processed, which is what MPI non-blocking or ADIOS2 deferred or async io do, by definition it involves two API functions, it can't be just one (unless the one is designed to be called twice). Another example: Petsc VecScatterBegin/End.

eisenhauer commented 5 years ago

But it's probably not malloc() vs. calloc(), because calloc() can rely on fancier things and it's likely not what's in use here. Presumably the getbuffer() call would extend the existing internal buffer with something like realloc() which would then be required (or not) to have that extended portion of the buffer initialized. (Hopefully it would/could be done with memset() and wouldn't require something type-specific which might not be nearly so efficient, but it would be done in ADIOS code and not the system, so I wouldn't count on it.) Anyhow, signing off of this...

germasch commented 5 years ago

Once again, Put is already expressing that. The population of data after that Put via a span is purely optional (thanks to move semantics BTW) if the user doesn't want zeros or perhaps, an initial value (which should be added).

How do you define move semantics? I'll give you mine: http://thbecker.net/articles/rvalue_references/section_02.html (arbitrarily picked from google, since people generally agree on what it means)

e.Put(var);
e.EndStep();
Variable has a block with zeros

Yeah, that's true.I guess you're making an argument how non-intuitive it is to realize that this is an overload that returns a buffer, because relying on looking at the return type to identify a function doesn't work so well if you're ignoring it.

I hope that the much common case is the one where you're not just writing zeros:

auto buf = e.Put(var);
fill_buffer(buf);
e.EndStep();

And yes, that works. It's just weird to Put before the data has been calculated.

williamfgc commented 5 years ago

@eisenhauer you're right, typical (not general) use is that memory will be allocated at the first step and the cost is amortized (changing the shape in every step breaks this, though), you bring good points.

For the record, the full signature:

adios2::Variable<T>::Span span = e.Put(var, bufferID, fill_value);
//like in any C++ function users can:
e.Put(var, bufferID, fill_value); // ignore the returned span and fill_value
e.Put(var); //fill with zeroes the default value
auto span = e.Put(var,bufferID,fill_value); //fill value (e.g. -1) and populated certain locations
etc.

The fill_value thing is directly from std::vector constructor (not sure how non-intuitive this is): http://www.cplusplus.com/reference/vector/vector/vector/

fill (2) | explicit vector (size_type n);          
            vector (size_type n, const value_type& val, const allocator_type& alloc = allocator_type());

Yeah, signing off, too. This is taken way out of context, especially when I am trying to illustrate how abstractions and polymorphism through function overloads (e.g. plot and **kawrgs) are very powerful tools adios2 should use in the abstract Engine class interface. And yeah, I've seen several cases along the years in which a block is filled with zeros (e.g. zero relative Temperature, adiabatic boundary conditions) and never touched or populated.

germasch commented 5 years ago

I'd tend to agree with Kai both on having a name for this like "GetBuffer" and in answering the question of whether or not a Put is clearer and leaves us in a better position for non-naive implementations. (And I think the implications of a default value should be considered.) I think the GetPutBuffer() without a Put() carries implicit assumptions about the underlying implementation. Particularly that what you get is actually a "pointer" into some internal buffer, that that buffer won't be dealt with internally in any way until EndStep(), etc. At least allowing, if not requiring an explicit Put() would allow internal handling of partial buffers by letting ADIOS2 know when the data was actually ready. Maybe no engine currently implements that, but introducing semantics that are incompatible with that functionality seems undesirable.

Thanks, now this actually brings some value to the discussion (which I hadn't considered, I'll be glad to admit that.) I think this is a very real reason that complicates things. FWIW, insitumpiwriter is a current engine that does take advantage of "partial buffers" by using MPI_Isend in Put(..., Deferred). I think that's there for a good reason.

However, we also have to think about this in the context of being able to write more than one segment of a global variable from a particular rank. Something we've had a problem with before internally (associating data with vars and now allowing multiple items), but we'd have to be careful to avoid messing that up here too. For example in Kai's example, { auto data = writer.GetBuffer(var); // Maybe "GetPutBuffer" would be better? calculate_output_data(data); writer.Put(var); // handing the data to ADIOS2 // do other stuff to overlap data writing / comm -- but can't touch data writer.EndStep(); } There's an implicit understanding that the writer.Put(var) applies to the data value earlier (most recently?) acquired from writer.GetBuffer(). var holds internal state about the selection which tells ADIOS2 about the geometry of what is being written. Generally this is captured at the Put() or PutDeferred(), so if you want to write multiple parts of a single global array from a single write, you set the selection for part A, do a Put() on the var, set the selection for part B, do a Put() on the var, etc. You couldn't really do that with GetBuffer() followed by Put(), unless you require that the GetBuffer/Put pairs for a single var don't overlap, which constrains how calculate_output_data() can work. GetBuffer() without a Put() doesn't have this problem, but it has the problem that only EndStep() tells ADIOS that the data is ready. Adding the Put() lets ADIOS know that the data is available before EndStep and avoids precluding fancier data handling than we have now, but if the only link between the GetBuffer and the Put() is the var, then we've got a problem with multiple-segment handling. The introduction of an opaque handle returned by GetBuffer (from which an actual buffer could be acquired with another function and which could be separately submitted to Put()) would have neither of these problems. But it's a more complex API.

Right, so that's a very real issue, the fact that var is holding state, which conceptually I think should be thought of going with Put(Sync or Deferred), so of the APIs had been Put(var, data, mode, selection) this would be more clear. SetSelection just sets additional info that's consumed by Put. (I'm not suggesting to change this here, I'm just saying it's my way of thinking about it).

I hope this can now turn into a useful discussion: I see overall two options:

Decide that the buffered API will not support what I call async behavior (allowing overlap). I think that's an option that can be reasonably argued for. It's what's currently there (obviously I would still strongly argue that it should be named GetBuffer), it wouldn't have a matching Put call and by means of not supporting Put make it obvious that it's not a two-step process.
Deal with the complexity. I think it may be possible to do it incrementally and fairly user friendly. I'd even volunteer to implement it (famous last words).

Right now a multi-block write looks like (I'm not entirely making this up, one of my codes actually does dynamic load balancing but having a varying number of patches (blocks) per MPI process).

{
    Blocks blocks; // allocates all blocks of data

    for (int i = 0; i < nBlocks; ++i)
    {
    calculate_output_data(blocks[i].data(); // Nice: Deferred allow us to work on calculating the next block of data while I/O is in flight

    var.SetSelection(selection[i]);
    writer.Put(var, blocks[i].data(), Mode::Deferred); // handing off data to ADIOS
    }

    writer.EndStep();
}

Proposed buffer-providing API:

{
    Blocks blocks(FLAG_DONT_ALLOC); // ugly, but you know what I mean
    for (int i = 0; i < nBlocks; ++i)
    {
        var.SetSelection(block.selection);
        blocks[i].place(
            var.GetBuffer()); // uses preallocated memory for my data structure
    }

    for (int i = 0; i < nBlocks; ++i)
    {
    calculate_output_data(blocks[i].data(); // Nice: Deferred allow us to work on calculating the next block of data while I/O is in flight

    writer.Put(var, blocks[i].data(), Mode::Deferred); // handing off data to ADIOS
    }

    writer.EndStep();
}

Not how the actual writing stuff look almost entirely identical. The only difference is that var.SetSelection is done before getting the buffer, not before Put (there would be no harm in doing it before Put, too). That means the blocks would have to know which selection they are associated with. This can be implemented transparently, ie, ADIOS2 keeps track of this association and recovers it by means of the data pointer passed to Put. The user still just gets a span, exactly like now.

germasch commented 5 years ago

Oh, on the default value thing. Default values are not terrible ideas, but we shouldn't think that they're always free. Without careful understanding of the semantics, they may require an implementation where the default value must be copied in to the data area before the user is given access to the memory. If you're using this API to avoid a copy on a huge data set, having semantics that required ADIOS to fill that huge data area with zeros prior to the app writing over it with its data is going to eliminate much of the "avoid a copy" gain that this API supposedly helps enable.

I agree that while there is some justification to zero-initialize the buffer before handing it to the user, it should be give thought. Probably not in this thread, since that's already unmanageable (I take fault in that). Actually filling memory with zeros will touch all pages, paging them in. On the other hand, OSs are smart enough to just repeatedly map one zero-filled page under certain circumstances, and then copy on write map actual RAM if something is actually written to them. Anyway, I'm not really uptodate on how this works these days given the hugepages thing...

eisenhauer commented 5 years ago

I think significant issues have been raised, but as per Norbert's earlier comment, he's going dark for a couple of days. So lets table this until later...

germasch commented 5 years ago

I implemented what I think are good semantics dealing with preallocation and Put. [I don't mean to rush anyone at all, I just want to show what I did. Feel free to leave this untouched until @pnorbert, @williamfgc and whoever are available.]

The code is in PR #1305 (not meant be merged but just show what can be done). Most of the change is trivial renaming of the Put overload to PutPrealloc, which I went with because it still expresses he connection to the Put functionality while making clear it's not Putting anything just yet.

Let me use an example from my test to show how multi-block prealloc'd buffers are used.

First, without prealloc:

    MyData<T> myData(m_Selections);

    for (int b = 0; b < myData.nBlocks(); ++b)
    {
        PopulateBlock(myData, b);
        var.SetSelection(myData.selection(b));
        engine.Put(var, &myData[b][0], adios2::Mode::Sync); // or Deferred
    }
    engine.Close();

This is how it works with preallocate buffers.

   MyDataView<T> myData(m_Selections);
    for (int b = 0; b < myData.nBlocks(); ++b)
    {
        var.SetSelection(myData.selection(b));
        auto span = engine.PutPrealloc(var);
        myData.place(b, span.data());
    }

    for (int b = 0; b < myData.nBlocks(); ++b)
    {
        PopulateBlock(myData, b);
        var.SetSelection(myData.selection(b));   //  can still be used, but is optional
        engine.Put(var, &myData[b][0], adios2::Mode::Sync); // or Deferred
    }
    engine.Close();

The most notable part should be that the part where the data is written is unchanged in both versions. The only difference is in the part that deals with setting up the app data structure. It changed from the memory-owning MyData to the non-owning MyDataView, which is then set up to provide access to the preallocated buffers from ADIOS2.

The 2nd call to SetSelection is optional, since the info has already been associated with the span. The Put call is actually optional as well right now, so the behavior is exactly compatible with what's already there (other than the renaming Put -> PutPrealloc).

The way I think about the Put calls is the same as if they weren't using prealloced buffers, that is, "the data is ready, I'm giving it to adios2". After these calls, the user may not touch those buffers anymore. One obvious difference is that. ADIOS2. There isn't really a user-visible difference between calling those functions with "Sync" vs "Deferred". For regular buffers, that flag determines when the buffer can be reused -- right away, or after PerformPuts. But since prealloced buffers were only temporarily provided to the user the answer is "never" either way. (If one wants another prealloced buffer, one can of course ask for one, using PutPrealloc.)

As was already pointed out earlier in this thread, for the current implementation of BP3, there is essentially no difference in calling Put with Sync vs Deferred vs not at all. However this approach separates the question of "who provides the memory" from "when is the data ready for adios2 to take". A future version of BP3 could start doing actual (maybe async) I/O after the first block has been calculated and Put, instead of waiting all the way until PerformPuts to do it all in one big batch, as the current Prealloc interfaces requires. More realistically, InsituMPI will start an MPI_Isend once each block is ready, allowing the calculation of the next block's data to overlap with communication. That's how it handles non-prealloced buffers. With this change, it'll treat prealloced buffers the same (well, right now nothing other than BP3 supports the prealloc interface at all, but it should be fairly straightforward to implement a default implementation in generic code that will at least make the interface work, even though it won't actually be zero-copy.)

germasch commented 5 years ago

Since @williamfgc was really fast at closing the PR (#1305), let me just continue this here:

@williamfgc wrote:

Adding a new function breaks the abstraction and makes polymorphism completely unnecessary.

As usual, you're putting out buzzwords, but they don't make any sense. How does adding a new function break any existing abstraction, while adding a new overload which does the same thing does not?. What does "makes polymorphism completely unnecessary" even mean? Polymorphism doesn't mean you have to always overload existing functions. I mean, why don't you just a single function PutGet to combine Put and Get into one single function GetPut with about 15 overloads? More overload polymorphism would make the code ever better, or maybe not?

The minute your return a span the preallocation is redundant.

I don't know what you mean. Your Put returns a span. My PutPrealloc returns a span. There is no difference, other than in the (misleading, as I've argued) name. If one is redundant, so is the other.

This was internally decided.

For all I can tell, there are two members of the team who above said "there is a need for further discussion".

Ultimately, you can fork and maintain your version or wrap around the function.

There is no way to do what is needed in an internal wrapper. I could do the renaming of course, and I can introduce a "the data is ready" function, but since there is no way to tell adios2, what should I do in the wrapper, other than ignore it?

If someone is willing to reopen, revisit and back off from its original decision they can as well maintain it.

Well, I've heard this one before.

Not only that, but this breaks re-usability for Get to also return a span in streaming (the buffer is already pre-allocated). Closing this from my end.

How does this change break anything with respect to Get? It doesn't touch Get at all. For Get to return a span might well actually make sense, btw.

Note that I asked a bunch of questions, addressing your points. It'd be nice to get (specific) answers.

williamfgc commented 5 years ago

Since @williamfgc was really fast at closing the PR (#1305), let me just continue this here:

This was an internal decision every single developer agreed upon. I can't go back and forth on things that I spend a fair amount of time designing, implementing, but most importantly adding testing with and integrating with apps, given the limited time. BTW, this issue was opened less than an hour before I added the new functionality, the same statement should then apply.

I mean, why don't you just a single function PutGet to combine Put and Get into one single function GetPut with about 15 overloads?

Polymorphism doesn't mean to put all mutually exclusive name combinations into a function to make it ambiguous, is to make overloads clear to the user. PutGet doesn't make it clear if we are putting or getting data. Not sure what point you're trying to make. We are trying to appeal to C++ applications in the long-term, this is a step backwards as brings back C style naming conventions. If this is PutPrealloc, then why not PutDeferred, PutSync, PutFile? Reality is not many use C in new projects and the feedback we get is that the adios2 API is very simple and intuitive to C++ apps. The fewer API calls, the better.

How does this change break anything with respect to Get? It doesn't touch Get at all. For Get to return a span might well actually make sense, btw.

How would you implement an equivalent Get, when in streaming engine Get already preallocated the buffer. The current signature takes care of all that.

Well, I've heard this one before.

Sure, but ultimately someone has to be responsible.

For all I can tell, there are two members of the team who above said "there is a need for further discussion".

Well, the reason it was initially implemented the way was presented was because everyone agreed to this. Without objections to also keep the APIs simple and minimal (this is exactly the reason why APIs go out of control becoming to complex for a user). This is not a product of personal flavors or decisions, but consensus by all developers who are funded to do the work. Coming up with a working version took me weeks to get this done, BTW. It was a lot of work. I can't speak for them.

Also, please refrain from making unnecessary internal changes to tested code (unless you are fixing a bug) adding wrappers to one line function to C++ only makes code bloat. This is not how successful dev teams work (and I work in many).

williamfgc commented 5 years ago

The way I think about the Put calls is the same as if they weren't using prealloced buffers, that is, "the data is ready, I'm giving it to adios2".

The above is true for EndStep, PerformPuts, Close, and Put in sync mode which we call the special case for that reason. All Put overloads are informing the intention to adios2 to Put a variable in a polymorphic way and each signature has different memory contracts.

Also, I see the confusion between Get and a GetPrealloc in streaming engines (as the buffer is preallocated anyways at BeginStep), but a simple Get that can also return a span is a lot more clear. This is breaking the API symmetry between Put and Get.

germasch commented 5 years ago

Since @williamfgc was really fast at closing the PR (#1305), let me just continue this here:

This was an internal decision every single developer agreed upon. I can't go back and forth on things that I spend a fair amount of time designing, implementing, but most importantly adding testing with and integrating with apps, given the limited time.

Since it's unlikely that anyone will get all details of a design right at the first attempt, iterative improvement is a natural part of the development process, one which I would have hoped would be considered to be beneficial. Of course there are certain considerations with respect to user-visible (API) changes, but since this is brand new, it's not really an issue.

BTW, this issue was opened less than an hour before I added the new functionality, the same statement should then apply.

I really don't think I'm following. I'm pretty sure I only opened the issue after the PR was posted, because otherwise I wouldn't have known what to write. I do think for changes like this, it'd be a good idea to actually wait for some discussion rather than merging the PR right away, but I'm sure that's not what you mean. (I also think that, of course, every project has the right to make their own policies. But if you do want this to be an open, community-involved project, that could be helpful.)

I mean, why don't you just a single function PutGet to combine Put and Get into one single function GetPut with about 15 overloads?

Polymorphism doesn't mean to put all mutually exclusive name combinations into a function to make it ambiguous, is to make overloads clear to the user.

See, we agree on that.

PutGet doesn't make it clear if we are putting or getting data. Not sure what point you're trying to make.

But now you introduced a Put overload that actually passes something (the buffer) the other way. For going that direction, you're otherwise use the word Get, for good reasons. (Note that I'm not suggesting to overload Get to give you the preallocated buffer, either, that would be just as confusing.)

I think it's clear from this discussion that even in C++, overloads are good for certain things, and not good for other things. Having the same "Put" function work on double, int, std::vector is clearly good. On whether having Deferred vs Sync be part of the name vs an extra argument, some people would argue one way or the other (I'm not arguing either way on that.) There's also no argument that overloading too much is bad (see GetPut), because it's confusing. So the only question is really, what's more intuitive: a PutPrealloc function which gives you a preallocated buffer for use in Put, or Put function that doesn't actually Put anything yet, but gives you back a buffer that will implicitly be transferred (be put) to the library later. The fact that a member of your team got confused by the latter is a pretty strong argument that this, indeed, confusing.

We are trying to appeal to C++ applications in the long-term, this is a step backwards as brings back C style naming conventions. If this is PutPrealloc, then why not PutDeferred, PutSync, PutFile? Reality is not many use C in new projects and the feedback we get is that the adios2 API is very simple and intuitive to C++ apps. The fewer API calls, the better.

This is not a C style naming convention. A C style naming convention would have "put_double", "put_double_array", "put_int", etc. No-one is arguing for that. BTW, why do you have PerformPuts? In your opinion, that's a C style naming convention (a.k.a. bad) You could use a new overload Put() for that, why not?

[About Maintenance] Sure, but ultimately someone has to be responsible.

Are you arguing that it's easy to maintain an additional overload vs a function that has a separate name? Why? Because you have to type 8 more characters ("Prealloc")? That's really the only difference. (Actually, there is a difference, it's easier to find all users of the Prealloc interface using ag PutPrealloc -- yeah, I know, old fashioned... If you name everything Put, you better have good IDE.)

For all I can tell, there are two members of the team who above said "there is a need for further discussion".

Well, the reason it was initially implemented the way was presented was because everyone agreed to this. Without objections to also keep the APIs simple and minimal (this is exactly the reason why APIs go out of control becoming to complex for a user). This is not a product of personal flavors or decisions, but consensus by all developers who are funded to do the work. Coming up with a working version took me weeks to get this done, BTW. It was a lot of work. I can't speak for them.

I did not doubt that there was some process behind the feature. I agree that it's useful (in certain cases). I've made two suggestions, one of which is basically entirely trivial, since it's only about naming. The other one makes it more future-proof for engines beyond BP3. The fact that "A" exists isn't an argument against improving it as such. Again, no-one (that I know) ever gets things perfect at the first attempt.

Also, please refrain from making unnecessary internal changes to tested code (unless you are fixing a bug) adding wrappers to one line function to C++ only makes code bloat. This is not how successful dev teams work (and I work in many).

So for once, that change wasn't unnecessary, because if you look later, you will see that the function that I introduced got expanded. Even if not, everywhere else these wrappers are simple one-line forwarding wrappers, and having one single one which has additional functionality beyond just the forwarding is inconsistent and confusing, in particular considering that those wrapper functions are even in a separate source file that one wouldn't usually even bother to check.

If you don't like wrappers, and you don't like C, so I suppose you don't like something archaic like macro. Why don't you do stuff like this, then? [okay, so this really is going off-topic].

@@ -189,30 +189,15 @@ adios2_variable *adios2_inquire_variable(adios2_io *io, const char *name)
             io, "for adios2_io, in call to adios2_inquire_variable");

         adios2::core::IO &ioCpp = *reinterpret_cast<adios2::core::IO *>(io);
-        const auto &dataMap = ioCpp.GetVariablesDataMap();
+        auto &variables = ioCpp.GetVariablesDataMap();

-        auto itVariable = dataMap.find(name);
-        if (itVariable == dataMap.end()) // not found
+        auto itVariable = variables.find(name);
+        if (itVariable == variables.end()) // not found
         {
             return variable;
         }

-        const std::string type(ioCpp.InquireVariableType(name));
-        adios2::core::VariableBase *variableCpp = nullptr;
-
-        if (type == "compound")
-        {
-            // not supported
-        }
-#define declare_template_instantiation(T)                                      \
-    else if (type == adios2::helper::GetType<T>())                             \
-    {                                                                          \
-        variableCpp = ioCpp.InquireVariable<T>(name);                          \
-    }
-        ADIOS2_FOREACH_STDTYPE_1ARG(declare_template_instantiation)
-#undef declare_template_instantiation
-
-        variable = reinterpret_cast<adios2_variable *>(variableCpp);
+        variable = reinterpret_cast<adios2_variable *>(&itVariable->Base());
     }
     catch (...)
     {

germasch commented 5 years ago

The way I think about the Put calls is the same as if they weren't using prealloced buffers, that is, "the data is ready, I'm giving it to adios2".

The above is true for EndStep, PerformPuts, Close, and Put in sync mode which we call the special case for that reason. All Put overloads are informing the intention to adios2 to Put a variable in a polymorphic way and each signature has different memory contracts.

Actually, "the data is ready, I'm giving it to adios2" was very well defined to be exactly one point, that is, the point where you call Put and none of the others. After your latest change, you're right there's now a plethora of functions which mean "the (well, some) data is ready, I'm giving it to adios2" in addition to their other meaning. So thanks for another argument as to why this is not a good idea.

Also, I see the confusion between Get and a GetPrealloc in streaming engines (as the buffer is preallocated anyways at BeginStep), but a simple Get that can also return a span is a lot more clear. This is breaking the API symmetry between Put and Get.

So it's likely true that there is no perfect symmetry between Put and Get. That's already the case, though, one takes a const T * and the other a T *. There are pairs of "Put - PerformPuts" and "Get - PerformGets". That looks identical, maybe, but they have rather different semantics, actually. In one case, the user does their data work before the "Put" -- in the other case, after the "PerformGets" (I'm talking about Deferred Mode, though essentially the same happens for Sync more). These semantics shouldn't change -- They don't change (after my changes) for Put, and I don't think they should change for Get. I don't see why they would. I haven't thought about this much beyond this, other than that it's a very different situation, and a perfectly symmetric API won't work, no matter what. Most clearly, after Get and PerformGets, the user needs to get access to the internal buffer, and then eventually has to relinquish control so that it can be reused. That sounds like maybe even two new API functions, and I'm pretty sure they shouldn't all be named Get no matter what. I don't really think design of a internal buffer access API is needed for the purposes of this thread, but if you do, I'd still encourage you to open a new issue for it, because this one here is obviously well too long already.

williamfgc commented 5 years ago

Actually, "the data is ready, I'm giving it to adios2" was very well defined to be exactly one point, that is, the point where you call Put and none of the others. After your latest change, you're right there's now a plethora of functions which mean "the (well, some) data is ready, I'm giving it to adios2" in addition to their other meaning. So thanks for another argument as to why this is not a good idea.

Put in deferred mode always meant the data must be ready later (there is a difference between data and data pointer, address, readiness). We make the promise not to modify the data pointer, not the data necessarily until EndStep/Close/PerformPuts. New contracts are not introduced by the latest span Put. In fact, the latest Put resuses existing contracts for sync (prealloc and initial population) and deferred behaviors (later population as optional) and data readiness at EndStep in one call as it's a natural fit to the Put abstraction.

This is not a C style naming convention. A C style naming convention would have "put_double", "put_double_array", "put_int", etc. No-one is arguing for that. BTW, why do you have PerformPuts? In your opinion, that's a C style naming convention (a.k.a. bad)

PerformPuts doesn't have overloads. By C naming convention I mean that a new overload must be a new function (I don't mean the "_").

So it's likely true that there is no perfect symmetry between Put and Get. That's already the case, though, one takes a const T and the other a T . There are pairs of "Put - PerformPuts" and "Get - PerformGets".

This is a deal-breaker to me and what worries me the most as we sell ADIOS as having a simple, symmetric API. We tried to keep as much symmetry as possible as that makes it really simple for users, the current Put signatures do keep the symmetry as Get can reuse. Any Getter/Setter API is based on const and non-const signatures, this is not what I mean by symmetry, this is a necessary memory condition, otherwise those functions wouldn't work correctly.

I don't really think design of a internal buffer access API is needed for the purposes of this thread

We do not provide low-level buffer access as in "get_buffer" functions, we provide a span, we can reuse the span at Get to give the user a reference to the data that they don't have to manage. ADIOS sits at a much higher level than a library calling a typical "get_buffer" function (we provide the higher-level "span"), just like we use the abstract, higher-level "Put", not the more concrete "Write".

eisenhauer commented 5 years ago

I disagree completely with the assertion that no contracts are changed in the one-call Put span. In both Put deferred and sync, the data is ready at the point of the call. With sync, the application is requesting that ADIOS take the data right away so that it can potentially overwrite, and with deferred the application is telling ADIOS that while the data is ready, it won't overwrite at least until EndStep(), so ADIOS can have more freedom in when it grabs the data. One-call Put span application data is not ready at the Put call, and ADIOS doesn't know when it might be ready until EndStep(). As a result, the code around these calls looks quite different: two calculate and fill memory before, span is after; two provide data to the call, one gets a place to put data from it; etc. These are significant differences. Not to mention that for this to actually achieve its copy avoidance goals, the span provided will become invalid at EndStep() because behind the scenes it references memory in an internal buffer. That's a weirdness that will need to be heavily documented because nothing like it happens around the two other Put calls.