Closed dmitry-ganyushin closed 2 years ago
That BP5 is more restrictive than BP4 is clear, particularly with the introduction of ReadRandom and the separation of the BeginStep and random access support. That means some corner cases that worked in BP4 won't be expected to work in BP5.
Can I ask what this reader code is supposed to do? This bit:
while (adios2_begin_step(e, adios2_step_mode_read, -1.,
&status) == adios2_error_none) {
if (step == 0 || status == adios2_step_status_end_of_stream) {
break;
} else {
adios2_end_step(e);
step++;
continue;
}
}
Seems to walk through all the timesteps until it gets an EndOfStream return from BeginStep. I.E. begin step has failed and you don't have any data on the current timestep. At this point, you're in undefined territory. You've gotten a failure code and are continuing to do inq var, gets, etc. We finally throw an exception when you try to call EndStep because it doesn't pair with a successful beginstep. Assuming that data is available for access after a failed BeginStep is bad practice with any engine. It may kind of "work", but if so it's by accident, not by our design.
Well, the reader code was copy pasted and adjusted for the test. The simplified version is the following
#include <stdlib.h>
#include <adios2_c.h>
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
adios2_adios *adios = adios2_init(MPI_COMM_WORLD, adios2_debug_mode_off);
adios2_io *io = adios2_declare_io(adios, "Read");
adios2_error adiosErr = adios2_set_engine(io, "BP5");
/*first fime */
{
adios2_engine *e = adios2_open(io, "localArray.bp", adios2_mode_read);
adios2_step_status status;
adios2_error err = adios2_begin_step(e, adios2_step_mode_read, -1.,
&status);
if (err != adios2_error_none) printf("First time begin step not Ok\n");
adios2_variable *v0 = adios2_inquire_variable(io, "v0");
adios2_end_step(e);
adios2_close(e);
}
{
/* second time */
adios2_engine *e = adios2_open(io, "localArray.bp", adios2_mode_read);
adios2_step_status status;
adios2_error err = adios2_begin_step(e, adios2_step_mode_read, -1., &status);
if (err != adios2_error_none) printf("Second time begin step not Ok\n");
adios2_end_step(e);
adios2_close(e);
}
MPI_Finalize();
return 0;
}
There is an error message that second 'begin step' fails.
OK, this time making sure you've pointed me at the right code. Is this the error message?
[Thu Jun 30 07:04:24 2022] [ADIOS2 ERROR]
Second time begin step not Ok
Yes, that is a simplified version of the previous example. It is not possible to make begin_step second time. That was fine with BP4. If it is not supported with BP5, I could try the random access mode.
On Thu, Jun 30, 2022 at 7:06 AM Greg Eisenhauer @.***> wrote:
OK, this time making sure you've pointed me at the right code. Is this the error message? [Thu Jun 30 07:04:24 2022] [ADIOS2 ERROR] : adios2_begin_step: [Thu Jun 30 07:04:24 2022] [ADIOS2 EXCEPTION] : variable v0 already defined in IO Read
Second time begin step not Ok
— Reply to this email directly, view it on GitHub https://github.com/ornladios/ADIOS2/issues/3267#issuecomment-1171082334, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3DP75RVT542VTTSFKSS4LVRV5T3ANCNFSM52G5TZSQ . You are receiving this because you authored the thread.Message ID: @.***>
For future reference, including the error message in a bug report is usually helpful. This one tells us what's going on, but it brings us into another ill-defined area of ADIOS: exactly how the IO and the Engine interact...
The IO and Engine are separate and we don't have the assumption that there's a one-to-one link between the two. You can create variables in the IO without creating an engine. Essentially the IO acts like a namespace with the prohibition that no two variables have the same name. When doing writes, you must create variables in the IO that contains an engine in order to do Puts. On the reader side, for the semantics of various calls to work, BeginStep (or Open) needs to populate the IO with the variables from the file/stream (that is the engine creates them in the IO). What isn't clear is when those engine-created variables go away. BP3/4 was very lax about this because it mostly loaded all metadata at the time of Open(), which meant that variables never had to go away. But, when processing new batches of metadata it removed all variables in the IO, whether it had created them or not. (This was/is arguably bad, because it can blow away objects that it hadn't created and invalidate pointers that are held by other engines or the application.) But, in your case it's the thing that makes your code "work".
What's going on here is that in BP4, the first open() fully populates the IO with all the variables. Those variables are persistent in the IO from through the BeginSteps, the EndSteps and remain after the Close(). Where they finally go away is in the 2nd BP4 Open, which clears the IO of everything (whether engine-created or not). Then when new variables get created when loading the metadata, all goes as planned.
BP5 in streaming mode (not random access), doesn't load any variables until BeginStep, and then it load only a timesteps-worth at a time. (Not having to load all the metadata at once is one of the major contributions of BP5.) Like BP4, BP5 clears the IO before it loads new metadata in BeginStep, but it doesn't remove all variables, but instead just removes the ones that it had created in the previous BeginStep. Generally, this is less dangerous than BP4's practice of removing everything, but in your case, because you're reusing the IO from the prior read we have a situation where the last BeginStep of the prior stream populated the IO and those variables were never removed (similar to BP4). But when the first BeginStep of the 2nd stream starts we have a problem. It doesn't remove any variables in the IO (because this stream hasn't created any), and when it tries to create variables corresponding to the variables in the file it gets an error because a variable of that name already exists.
So, I think the question is how to fix this properly. You can easily solve it for your case by simply not reusing the IO, thereby avoiding the problem that ADIOS variable lifetimes are not well-defined. One might argue that no variable created by an engine should live beyond that engine's Close(). That's probably reasonable, but no engine currently works like that. One also might take the view that variables created in BeginStep shouldn't live beyond EndStep, but that might break existing code (particularly code that does a read followed by a write in another engine sharing the IO). A third possibility is that any variable created in Open/BeginStep should automatically delete any variable of the same name that happens to already exist in the IO. That might also leave dangling pointers somewhere, but probably nothing is perfect. Any of these would be a more exact definition of the lifetime of variables implicitly created in IOs by reader engines than we have now... Perhaps a topic for a future dev meeting. In the meantime, does the "delete the old IO and create a new one for the new stream" work for you?
On a similar note, I remember trying to avoid creating a new I/O object for every invocation of a subroutine in the GTC Fortran code. I ultimately manually cleared out all variables in the I/O object by calling adios2_remove_all_variables(io, ierr)
so I could reuse the I/O object.
On a similar note, I remember trying to avoid creating a new I/O object for every invocation of a subroutine in the GTC Fortran code. I ultimately manually cleared out all variables in the I/O object by calling
adios2_remove_all_variables(io, ierr)
so I could reuse the I/O object.
Well, at least for BP5, I think the the approach of removing all engine-created variables on engine Close may be the best approach. Would that have solved your problem? The difficulty with that for BP3/4 is that those engines really just use the IO as their personal database without a lot of thought to the consequences if the IO is actually host to more than one engine. Reverse-engineering that assumption out of those old engines is likely more than we want to do, but BP5 tries to do better. So if we can sort out semantics that make sense, we can likely make it work.
With v2.8.1, the issue with reuse may no longer be applicable. To give you some background (hopefully without deviating from the main topic), the I/O reuse issue was related to copying an already open file during a checkpoint.
In the snippet below, history.bp is a diagnostic file that is opened for writing. When a checkpoint is taken, its contents are copied into a history_backup.bp.
! history.bp is being written to during each timestep
adios2_open(engine, io_1, 'history.bp', adios2_mode_write)
! Take a checkpoint. Copy history.bp to history_backup.bp
! First, open history.bp for reading using I/O object io_checkpoint
adios2_open(engine2, io_checkpoint, 'history.bp', adios2_mode_read)
! Open history_backup.bp for writing. Reuse io_checkpoint
adios2_open(engine3, io_checkpoint, 'history_backup.bp', adios2_mode_write)
! copy data from history into history_backup
copy()
adios2_close(engine2)
adios2_close(engine3)
! Clear out io_checkpoint so you can reuse it during the next checkpoint
adios2_remove_all_variables(io_checkpoint)
With v2.8.1 and BP5 where truncating steps is allowed and recovering a bp file from a partially written step works, copying a bp file into another during checkpoint is no longer required. So the I/O reuse problem will no longer be applicable after the ADIOS code in GTC is updated to use v2.8.1 and BP5.
Thanks Kshitij. Yes, I think that BP5 is the best answer for you, but hopefully PRs #3272 and #3270 might help as well... Closing this because I think the issue is addressed.
Describe the bug After file reopening, BP5 does not allow making begin step. I am using the corresponding function from C-binding with BP5 engine. The error code of the begin_step function is not Ok. That is different to the behavior of the BP4 engine. To Reproduce My writer/reader files are here: https://github.com/dmitry-ganyushin/bp5-begin-step.git ADIOS2 revision Desktop
OS/Platform: Ubuntu 20.04 Build: version gcc 9.4.0