ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
269 stars 126 forks source link

API Change request: replace Mode:Read in Open with two new modes #1384

Open pnorbert opened 5 years ago

pnorbert commented 5 years ago

Request: Have two separate and explicit modes for reading step-by-step as a stream and for reading arbitrary step(s) in post-processing. Instead of io.Open(fname, adios2::Mode::Read); have io.Open(fname, adios2::Mode::ReadStepByStep); io.Open(fname, adios2::Mode::ReadAsFile); (names are debatable)

In the high level python API we replace "r" for "rs" and "rf" and an error on "r" will explain the options.

Issue: We want to implement proper streaming through file with the BP4 format. We want to keep in memory the metadata only about a limited number of steps. File-based engines right now must process all metadata and present all steps at Open() time in case the read code uses SetStepSelection() and does not use BeginStep(). Only if the code calls BeginStep(), do we know that it is now doing step-by-step reading.

Reasoning: We always had this conflict to support the special case of reading arbitrary steps which only works in file-based engines. The programmer must decide in the beginning of coding, which way to go. Either read step-by-step, or read arbitrary steps. The source code will be very different and there is no going back and forth. So it seems fine to force the programmer to tell ADIOS, which way the code goes.

The separate modes will allow us to write two separate reader engine classes for the BP4 engine and choose between them in the IO class at Open(). The step-by-step reader then can keep the metadata consumption down.

Options: we thought about different options:

All of these would achieve the same but the first two seems just to be more cumbersome for use plus its an option for the user, while the decision has to be made by the developer. The third is less elegant but very much equivalent: force the developer to tell ADIOS what is going on.

Let us know if you agree or disagree.

germasch commented 5 years ago

I have one side comment: If you're changing the API with regards to those flags, you might as well use the opportunity to separate the sync/deferred ("launch") flags from the "open" flags:

/** OpenMode in IO Open */
enum class Mode
{
    Undefined,
    // open modes
    Write,
    Read,
    Append,
    // launch execution modes
    Sync,
    Deferred
};

For example, have a separate OpenMode::{Write,ReadWhatever,Append}. Right now, it looks like you can pass Mode::Sync to Open(), which doesn't make a lot of sense, though.

germasch commented 5 years ago

On the main issue, let's say the proposed solution sounds kind of "kludgy" to me, though I also don't have a good understanding of what all the different engines do. But it kind of sounds like you ask the user to decide what they're going to access at open time to avoid the complexity of implementing the corresponding logic inside of ADIOS2.

It would help if it was actually documented what the goals with BP4 are. I gather, you want to avoid having to read all the metadata (on all procs?), which certainly seems worthwhile. But I don't necessarily see why you need to know at open time, you could just read metadata lazily, ie, read a piece of info only when it's needed.

The programmer must decide in the beginning of coding, which way to go. Either read step-by-step, or read arbitrary steps. The source code will be very different and there is no going back and forth.

Can you explain why this is? For an application, it doesn't really sounds like there's a difference between "read the first step, then the next step" to "read step 5, then read step 10", so I'm not sure why you want to force that?

A maybe somewhat related point is that there seem to be many different ways of handling steps even in streaming applications, e.g., "read latest step, drop previous ones" vs "read step by step even if it blocks the writer", etc. I could well imagine a scenario where even in the streaming context, more than exactly one step might be accessible on the reader side, too. (That can be useful to calculate deltas during in-situ analysis). The point here is just that going to a model which has exactly two options might well not be future proof, and adding new open modes probably isn't a good way to handle all these different scenarios.

pnorbert commented 5 years ago

Thank you Kai for your observations. The lazy metadata reading got me and @wfgodoy thinking and we figured that there is a way to do this, with more interactions between the IO and Engine class. So we put this change request on hold until we figure which way to go.

We were distracted by thinking of the most complicated cases, where variables and attributes appear at any step, their dimensions may change at any step, and we need to process the entire metadata of all steps in Open to allow the user to call IO.InquireVariable() or Engine.AllStepsBlockInfo(). We can postpone that processing to the actual call that uses it and Open does not need to process any metadata.

On other note, since you asked what the difference is between the two modes: the source code written with in mind that all steps are accessible at all times (i.e. traditional file reading) cannot be used in step-by-step reading. So you cannot use the majority of engines anyway. Look for the Heat Transfer example in adiosvm repo, the fortran analysis code ( heatAnalysisadios2*.F90):

https://github.com/pnorbert/adiosvm/tree/master/Tutorial/heat2d/fortran/analysis

The file version of the code cannot run in streaming mode. And this is s kinda-similar-to-streaming example because it reads the variable data step by step. If you want, rewrite it to read the entire dataset in once (into a 3D array) and loop over its last dimension in memory.

So in contrast to your opinion, the one Read mode approach has always seemed "kludgy" to me, because it is used for two orthogonal coding approach (file based with all steps available vs. step-by-step-only-forward) and the user must decide how they plan to access the data at open time. Once you inquire a variable first, begin step will throw an exception. Once you do begin step, SetStepSelection() will throw an exception. There is no jumping back and forth once you did your first move.

So in the end we go out of our way to implement more complicated logic in the file engines instead of having two of each (for reading) which do one thing simple. But this how it was decided in the beginning of ADIOS2, since we did not like the ADIOS1 approach of having two separate Open functions either.

On Thu, Apr 25, 2019 at 12:50 PM Kai Germaschewski notifications@github.com wrote:

On the main issue, let's say the proposed solution sounds kind of "kludgy" to me, though I also don't have a good understanding of what all the different engines do. But it kind of sounds like you ask the user to decide what they're going to access at open time to avoid the complexity of implementing the corresponding logic inside of ADIOS2.

It would help if it was actually documented what the goals with BP4 are. I gather, you want to avoid having to read all the metadata (on all procs?), which certainly seems worthwhile. But I don't necessarily see why you need to know at open time, you could just read metadata lazily, ie, read a piece of info only when it's needed.

The programmer must decide in the beginning of coding, which way to go. Either read step-by-step, or read arbitrary steps. The source code will be very different and there is no going back and forth.

Can you explain why this is? For an application, it doesn't really sounds like there's a difference between "read the first step, then the next step" to "read step 5, then read step 10", so I'm not sure why you want to force that?

A maybe somewhat related point is that there seem to be many different ways of handling steps even in streaming applications, e.g., "read latest step, drop previous ones" vs "read step by step even if it blocks the writer", etc. I could well imagine a scenario where even in the streaming context, more than exactly one step might be accessible on the reader side, too. (That can be useful to calculate deltas during in-situ analysis). The point here is just that going to a model which has exactly two options might well not be future proof, and adding new open modes probably isn't a good way to handle all these different scenarios.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ornladios/ADIOS2/issues/1384#issuecomment-486751822, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYYYLITY7KUTNZ3WSGILR3PSHOMXANCNFSM4HIOMMVQ .

germasch commented 5 years ago

Thank you Kai for your observations. The lazy metadata reading got me and @wfgodoy thinking and we figured that there is a way to do this, with more interactions between the IO and Engine class. So we put this change request on hold until we figure which way to go. We were distracted by thinking of the most complicated cases, where variables and attributes appear at any step, their dimensions may change at any step, and we need to process the entire metadata of all steps in Open to allow the user to call IO.InquireVariable() or Engine.AllStepsBlockInfo(). We can postpone that processing to the actual call that uses it and Open does not need to process any metadata.

I'd say that generally sounds preferable.

On other note, since you asked what the difference is between the two modes: the source code written with in mind that all steps are accessible at all times (i.e. traditional file reading) cannot be used in step-by-step reading. So you cannot use the majority of engines anyway. Look for the Heat Transfer example in adiosvm repo, the fortran analysis code ( heatAnalysisadios2*.F90): https://github.com/pnorbert/adiosvm/tree/master/Tutorial/heat2d/fortran/analysis The file version of the code cannot run in streaming mode.

I can see that, though I guess I'm not sure I'd say that's desirable. So I guess I have kind of the picture of a music player in my mind. You can say "play", "pause", "next song". If it's a CD player, you can also directly say you want song 17, and it can show you how many songs there are. A streaming music player may have "skip to next song" button, maybe a "skip back" button. For a general music player API, I'd expect it to have an API that let's me do "skip back" or "select song # " functions. And if the underlying engine doesn't allow that, I'd get a "not supported" error back. But I wouldn't expect to have to write my code in a way where I have o know whether it's a CD player or a streaming player if I just wanted to do "open" and "play".

So I think a lot of the complexity of the issue is related to the fact that you have SetStepSelection on a per-variable basis, and the fact that it includes a count. That leads to the situation where in a file, you can inquire a variable, but then you can't directly "Get" from it, because it doesn't exist for the current step. OTOH, if you're streaming, you would have the InquireVariable fail until you actually get to the step that has it. I think these kind of subtle differences between file and streaming based I/O aren't really desirable considering that you're trying to abstract away whether the I/O is done on a file or a stream in ADIOS2. Ie., in theory you can just change the xml file to switch from one to the other, which is nice. But then if the app actually fails anyway, because of such differences which require the app code to be changed, that makes the XML runtime config much less powerful.

Obviously, you can't abstract away all differences between things that are fundamentally different. But I'd say it'd be good to minimize them. So, to stay closer to my music player example, say you had a general "SetStep" function in addition to "BeginStep". Then I think you could make both of these work, on both files and streams, and I'd say that would make life easier.

for (int i = 0; i < 5; i++) {
  io.SetStep(i);
  var.Get(...);
}

or

for (int i = 0; i < 5; i++) {
  io.BeginStep(i);
  var.Get(...);
  io.EndStep();
}

On the other hand, if you have a stream and you say SetStep(5) while you're currently at step 10, you might have to return some kind of "not supported". You can think about what you'd want to do if you say SetStep(10) while you're currently at step 5 -- the options would be to wait until step 10 arrives in the stream, or you could say "not supported". (I think I'd prefer the former, because it mirrors what happens with a file).

Now, this is all much more complicated due to the step selection being per variable. I'm not sure how useful that is -- is there really anyone who needs to simultaneously get the value of one variable at step 5, and another one at step 10? Ie., wouldn't this be clearer, anyway:

io.SetStep(5).
var1.Get(...);
io.SetStep(10)
var2.Get(...);

And this is a kinda-similar-to-streaming example because it reads the variable data step by step. If you want, rewrite it to read the entire dataset in once (into a 3D array) and loop over its last dimension in memory. So in contrast to your opinion, the one Read mode approach has always seemed "kludgy" to me, because it is used for two orthogonal coding approach (file based with all steps available vs. step-by-step-only-forward) and the user must decide how they plan to access the data at open time. Once you inquire a variable first, begin step will throw an exception. Once you do begin step, SetStepSelection() will throw an exception.

Well, I guess I'll agree that what you describe is kludgy. I've seen only one example where SetStepSelection is used with a count != 1. If you want to support that, that's just not really possible with a stream (well, it might be possible, but not easily/efficiently). I'd say I'm not sure SetStepSelection with a count != 1 is all that useful -- at least, the same can be achieved by a loop getting (a slice of) a variable at a time, and I don't think the performance would be much worse in that case. But anyway, you could support that on files only and have it fail when used on a stream. OTOH, reading data step by step is perfectly reasonable to support for both files and streams, transparently to the user.

There is no jumping back and forth once you did your first move.

Right, that may be true, but I'm not sure it needs to be that way. I mean even in the streaming case, why shouldn't you be able to set the selection to the current step? Or, why in the file case prohibit reading it step-by-step with the streaming interface? I guess I agree that if you have two totally different ways of accessing files / streams, different open functions are appropriate, in fact I might even say that they should return different things, a stream reader vs a file reader where one has certain apis and the other doesn't.

But still, since file read is basically like stream read while allowing some additional operations, I'd rather see one unified interface in as far as possible, and an error if you do something on a stream that's only supported on a file.

pnorbert commented 5 years ago

Kai,

We have a proper API for I/O and you can freely switch between engines to read from a stream or a file with the same code. This includes BeginStep/EndStep and Put/Get. You are encouraged to write your code this way and then you can have what you wanted.

We also have an extension that only applies to files where all steps are available at once. The extension allows for reading arbitrary steps or multiple steps of a variable. This is for codes that need random access to the entire dataset. There is nothing here that we intend to support in streams. Codes using this feature will never work in situ.

Hence I was asking for modes like ReadStepByStep and ReadPostProcessing, not ReadStream vs ReadFile.

On Thu, Apr 25, 2019 at 9:33 PM Kai Germaschewski notifications@github.com wrote:

Thank you Kai for your observations. The lazy metadata reading got me and @wfgodoy thinking and we figured that there is a way to do this, with more interactions between the IO and Engine class. So we put this change request on hold until we figure which way to go. We were distracted by thinking of the most complicated cases, where variables and attributes appear at any step, their dimensions may change at any step, and we need to process the entire metadata of all steps in Open to allow the user to call IO.InquireVariable() or Engine.AllStepsBlockInfo(). We can postpone that processing to the actual call that uses it and Open does not need to process any metadata.

I'd say that generally sounds preferable.

On other note, since you asked what the difference is between the two modes: the source code written with in mind that all steps are accessible at all times (i.e. traditional file reading) cannot be used in step-by-step reading. So you cannot use the majority of engines anyway. Look for the Heat Transfer example in adiosvm repo, the fortran analysis code ( heatAnalysisadios2*.F90): https://github.com/pnorbert/adiosvm/tree/master/Tutorial/heat2d/fortran/analysis The file version of the code cannot run in streaming mode.

I can see that, though I guess I'm not sure I'd say that's desirable. So I guess I have kind of the picture of a music player in my mind. You can say "play", "pause", "next song". If it's a CD player, you can also directly say you want song 17, and it can show you how many songs there are. A streaming music player may have "skip to next song" button, maybe a "skip back" button. For a general music player API, I'd expect it to have an API that let's me do "skip back" or "select song # " functions. And if the underlying engine doesn't allow that, I'd get a "not supported" error back. But I wouldn't expect to have to write my code in a way where I have o know whether it's a CD player or a streaming player if I just wanted to do "open" and "play".

So I think a lot of the complexity of the issue is related to the fact that you have SetStepSelection on a per-variable basis, and the fact that it includes a count. That leads to the situation where in a file, you can inquire a variable, but then you can't directly "Get" from it, because it doesn't exist for the current step. OTOH, if you're streaming, you would have the InquireVariable fail until you actually get to the step that has it. I think these kind of subtle differences between file and streaming based I/O aren't really desirable considering that you're trying to abstract away whether the I/O is done on a file or a stream in ADIOS2. Ie., in theory you can just change the xml file to switch from one to the other, which is nice. But then if the app actually fails anyway, because of such differences which require the app code to be changed, that makes the XML runtime config much less powerful.

Obviously, you can't abstract away all differences between things that are fundamentally different. But I'd say it'd be good to minimize them. So, to stay closer to my music player example, say you had a general "SetStep" function in addition to "BeginStep". Then I think you could make both of these work, on both files and streams, and I'd say that would make life easier.

for (int i = 0; i < 5; i++) { io.SetStep(i); var.Get(...); }

or

for (int i = 0; i < 5; i++) { io.BeginStep(i); var.Get(...); io.EndStep(); }

On the other hand, if you have a stream and you say SetStep(5) while you're currently at step 10, you might have to return some kind of "not supported". You can think about what you'd want to do if you say SetStep(10) while you're currently at step 5 -- the options would be to wait until step 10 arrives in the stream, or you could say "not supported". (I think I'd prefer the former, because it mirrors what happens with a file).

Now, this is all much more complicated due to the step selection being per variable. I'm not sure how useful that is -- is there really anyone who needs to simultaneously get the value of one variable at step 5, and another one at step 10? Ie., wouldn't this be clearer, anyway:

io.SetStep(5). var1.Get(...); io.SetStep(10) var2.Get(...);

And this is a kinda-similar-to-streaming example because it reads the variable data step by step. If you want, rewrite it to read the entire dataset in once (into a 3D array) and loop over its last dimension in memory. So in contrast to your opinion, the one Read mode approach has always seemed "kludgy" to me, because it is used for two orthogonal coding approach (file based with all steps available vs. step-by-step-only-forward) and the user must decide how they plan to access the data at open time. Once you inquire a variable first, begin step will throw an exception. Once you do begin step, SetStepSelection() will throw an exception.

Well, I guess I'll agree that what you describe is kludgy. I've seen only one example where SetStepSelection is used with a count != 1. If you want to support that, that's just not really possible with a stream (well, it might be possible, but not easily/efficiently). I'd say I'm not sure SetStepSelection with a count != 1 is all that useful -- at least, the same can be achieved by a loop getting (a slice of) a variable at a time, and I don't think the performance would be much worse in that case. But anyway, you could support that on files only and have it fail when used on a stream. OTOH, reading data step by step is perfectly reasonable to support for both files and streams, transparently to the user.

There is no jumping back and forth once you did your first move.

Right, that may be true, but I'm not sure it needs to be that way. I mean even in the streaming case, why shouldn't you be able to set the selection to the current step? Or, why in the file case prohibit reading it step-by-step with the streaming interface? I guess I agree that if you have two totally different ways of accessing files / streams, different open functions are appropriate, in fact I might even say that they should return different things, a stream reader vs a file reader where one has certain apis and the other doesn't.

But still, since file read is basically like stream read while allowing some additional operations, I'd rather see one unified interface in as far as possible, and an error if you do something on a stream that's only supported on a file.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ornladios/ADIOS2/issues/1384#issuecomment-486893716, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYYYLPRHZ3NLVE6IY7NGOLPSJLU7ANCNFSM4HIOMMVQ .

germasch commented 5 years ago

Okay, so having BeginStep/EndStep work the same for both files and streams is good, and in fact that's what I was arguing for (ie, have the usual case work transparently, no matter what happens underneath). I think I misunderstood your previous message, I thought you were saying once you open as a file, you'd have to use SetStepSelection and couldn't use BeginStep/EndStep.

I do think, though, that there still are subtle differences when using the "streaming" interface depending on the underlying file vs stream, even if random access is not used. In trying to look into this, I hit two other bugs, so I'll file those separately first. [As a side note, I've mainly just used adios2 without step support at all, as I prefer to get separate output files per timestep, which are already unpleasantly large, rather than one huge file, so I don't have much experience with Begin/EndStep or SetStepSelection].

For the particular issue a hand, if you do think the user needs to tell you what they're going to do at open time, I'd prefer to keep "Mode::Read" but change the meaning to "open in streaming mode", and add "Mode::ReadRandomAccess" or something like that which would work only on files but not on streams, and then support SetStepSelection etc. I haven't tried, but I think the vast majority of tests and examples would require no change in that case.

We also have an extension that only applies to files where all steps are available at once. The extension allows for reading arbitrary steps or multiple steps of a variable. This is for codes that need random access to the entire dataset. There is nothing here that we intend to support in streams. Codes using this feature will never work in situ.

Sure, random access isn't compatible with streaming. But there are cases in between simple sequential access and entirely random access. For example, in post processing one might have code which calculates diagnostics from only every 5th step. If one wants to change that processing into in-situ diagnostics, it'd be nice if one could express the same thing directly rather than some kind of for (int i = 0; i < 4; i++) { reader.BeginStep(); reader.EndStep(); } // drop 4 steps in between. There is some analogy with C++ iterators, ie., there are cases in between InputIterator and RandomAccessIterator. I'm not saying anything should be done to that end at this time, but when thinking about APIs, I think one should realize that the future may bring things beyond what's currently supported.

I'll add one more thing: For code coupling or in-situ diagnostics things are likely deterministic (do every step, or every nth step). On the other hand, for a web status interface, the reader side might want to express something like "get me the data from the latest step available". I think that's supportable by a streaming interface, and I can well imagine that being useful in real life.

williamfgc commented 5 years ago

@germasch in principle there is nothing "RandomAccess" can't do that streaming (BeginStep/EndStep) can do. There is no in-between random and streaming modes, we either use one or the other (partial random access is already random access). To me, the biggest advantage or enforcing this separation and being mutually exclusive (either use BeginStep/EndStep or SetStepSelection, but not both) is that we completely separate portable and non-portable code across adios2 engines.

I agree with above that lazy evaluation is the way to go, thanks for bringing it up. This is done already in the Python and C++ high-level APIs, which support both modes BTW, open (or the constructor) doesn't do a physical open until the first write/read (similar to copy-on-write) and parameters can be set in between.

Lazy evaluation of Open will also allow setting a "Step View" into the metadata file for random access, today the default is a "all steps view".

germasch commented 5 years ago

So I guess one thing I can take away from #1387 is that, at least for me, that the semantics of when a variable exists / is valid / can be inquired legally are really rather subtle.

Thinking about this a bit more, I get a feeling that the need for the complicated interface / switching between streaming and random access mode is actually of how the core of adios2 has been designed with Variable being used as one object that serves (at least) two purposes, helping engines keep track of their internal state, ie, which variables exist in the file/stream, as well as underlying the user-facing API.

This is somewhat shown by the fact that if I avoid using cxx11::Variable on the user side, everything works just fine:

    for (int step = 0; step < 3; step++) {
      reader.BeginStep();
      reader.Get("x", x);
      reader.EndStep();
      printf("step %d: %g %g %g\n", step, x[0], x[1], x[2]);
    }

If "x" doesn't exist in a given step, I get an exception, and otherwise things work just fine, which is exactly what I would expect. Using cxx11::Variable, however, things depend on when I call InquireVariable, and what was a valid variable may change into an invalid one when I move on to the next step (well, I guess it should change into an invalid one, I don't think it actually does).

Let me mention some other seemingly unrelated issue which I hit on day 1 of using ADIOS2, and which I still hit occasionally, though by now I recognize it quickly. It's kinda obvious in this snippet:

    auto var = m_Io.DefineVariable<double>("x", {3});
    adios2::Engine reader = m_Io.Open("xxx.bp", adios2::Mode::Read);

This throws C++ exception with description "ERROR: variable x exists in IO object CXX11_API_TestIO, in call to DefineVariable in the line where Open is called. That confused me to no end, since when I called DefineVariable, there was no exception, and when I got the exception, I didn't call DefineVariable. Reading and writing variables is not symmetric, which was surprising (to me, anyway). In my checkpointing use of adios2, I have to work around this, as my code follows the same path for reading and writing, and all I want is to store Variable objects for the members, and I basically have to do a "try InquireVariable, if it fails DefineVariable", even though all I want to do is to say "I need a variable of type x named y".

So let me kick around an idea: Allow for cxx11::Variable to exist even if it doesn't exist in the file. Basically, the user-facing Variable is mostly just a string (and a type). It will be connected to the internal core::Variable if that exists. There's no need to have cxx11::Variable to have different behavior for streaming / random access then, you can inquire it before or after BeginStep (or even before Open). It's just a handle for a variable which may or may not exist. So cxx11::Variable is really just a handle for a "potential" Variable, which may become valid/invalid at different steps, or which might not exist in the file yet at all (as is the case in writing mode, anyway). If you're trying to "Get" the variable, throw an exception if it doesn't exist in the current step. If you're calling "SetStepSelection" on it and it doesn't exist in the engine in random access fashion, throw an exception then.

I haven't done this, and I haven't completely thought it through to the end. But I have a feeling a lot of the subtlety and differences between modes would go away on the user side.

williamfgc commented 5 years ago

Not sure if subtle, but https://adios2.readthedocs.io already covers many of the mentioned topics (Get("x",x) is a wrapper that calls InquireVariable ; Variable<T> var; is already a placeholder as it has an empty constructor, it also has a state between BeginStep/EndStep through the operator bool, APIs are symmetric as much as DefineVariable/InquireVariable, Put/Get, allow for a reversible IO object state at Write and Read, etc.). Feedback is always encouraged if something is not clear or is missing in the docs.

Also, I personally don't see anything complicated with Variable (or ADIOS2 besides the typical learning curve), it's just an object with a state. So far, my observation is that complications arise when: 1) ADIOS2 APIs are applied outside the current intended use-case scope reflected in tests/examples/tutorials/applications, and 2) bugs :)

germasch commented 5 years ago

Thanks for pointing me to the docs. I know what Get("x", x") does, that's what I used it in my example. The point is that when using this version of Get, it actually works as expected, as opposed to my original version with InquireVariable, which did not (#1387).

I seem to have trouble to get my point across, so I put some code where my mouth is. I'll put the code into a PR so you can look at it (though, while it demonstrates that the issue can be fixed quite easily, I did it rather hackily, at the wrong level of abstraction, etc, so it should definitely not be merged).

I first added this test, which assumes that xxx.bp contains the variable "x", but not for the first step.:

    adios2::Engine reader = m_Io.Open("xxx.bp", adios2::Mode::Read);

    std::array<double, 3> x;
    auto var = m_Io.InquireVariable<double>("x");
    for (int step = 0; step < 3; step++)
    {
        reader.BeginStep();
        if (step == 0)
        {
            EXPECT_FALSE(var);
            EXPECT_THROW(reader.Get(var, x.data()), std::invalid_argument);
        }
        else
        {
            EXPECT_TRUE(var);
            reader.Get(var, x.data(), adios2::Mode::Sync);
            EXPECT_EQ(
                x, (std::array<double, 3>{10. + step, 20. + step, 30. + step}));
        }
        reader.EndStep();
    }
    reader.Close();

It's essentially the same as in #1387, and it fails. The actual fix I introduced changes < 20 lines, and the test now passes. In addition, while I haven't actually tested it, I'm fairly sure that the above code will now also work if a streaming engine is used. As a side effect, it fixes some other not particularly critical bug where cxx11::Variable might be left with a dangling pointer to a core::Variable which has gone away. It does change behavior, though, but I think existing (working) code won't break.The tests on the PR will tell. It may well introduce new bugs, and it introduces some inefficiency. However, that's related to the fact that, as I said, this isn't the right fix at the right level -- it's not a fundamental limitation.

germasch commented 5 years ago

FWIW, the proof-of-concept change in #1391 did in fact cause only a little bit of breakage. One test failure comes from some case I didn't think through (now fixed), and the remaining three are related to what I think is existing "surprising" behavior at the least, which I documented in #1392.

I'd like to make the point that I think the behavior I'm proposing makes reading and writing more symmetric:

Existing behavior on (streaming or file) write:

Existing behavior on streaming read:

Existing behavior on file read:

My proposed behavior is to support the following for both file and streaming read (the previous two approaches continue to work, though):

So it's mostly backwards compatible and allows reading code to be written to mirror the writing code. There is one hitch, which is, how do you find out if a variable exists in the current step? It can be addressed in various ways, but I haven't done it.

None of this changes that the PR is just a demonstration that an API can be implemented that doesn't require distinguishing streaming from file mode, and that's largely backwards compable. It's still true that the way I've actually implemented is entirely "wrong", though.