ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
276 stars 128 forks source link

Names in API: Feel "More Like XML" #1281

Open ax3l opened 5 years ago

ax3l commented 5 years ago

Hi,

one feedback we gave @pnorbert and @sklasky in our last weeks workshop was about the "impression/feeling" that developers and users have when interacting with variable and attribute names in ADIOS files compared to what they know (XML, JSON, filesystem trees or HDF5).

As a matter of personal opinion, HDF5 is still around not because of its performance or painful 80s C API but because people can quickly open and browse it like a filesystem and definitely because of h5py. (For the first part we added ADIOS1 to HDFCompass in the past.) The new ADIOS2 API looks amazing and is super clean, yet as in ADIOS1 one thing is a bit undervalued - hierarchies hidden in "names" (paths).

I am aware that "directory hierarchies" in ADIOS1 and 2 are internally just flat maps of names, which is fine, yet a users doesn't need to know that in the frontend. What we should add to make .bp file interaction for developers feel like browsing a filesystem is:

For example in h5py there is a specific Group class that can be interacted with just as an intermediate object to access deeper Groups, Attributes and Variables without specifying the full path. Such a group basically just changes what is viewed as "root" / and otherwise would be similar to adios2::IO. It would be important to allow "listings" on such a view that are both recursive and non-recursive from the changed "present working dir" in the ADIOS-name (pwd).

Such an intermediate object in the APIs would also be really nice for ADIOS2. It's purely cosmetical, also it's purely optional, yet makes a world of a difference for the developer that expects some kind of "filesystem/XML hierarchy" when writing or reading complex data. With many variables and attributes, we just want to imagine ourselves to put them in little boxes (directories) and we want to create and annotate such directories with attributes even if we do not yet put variables in them.

If I understood the new APIs correctly, we probably just need an adios::Group object that is a filtered "view" of an adios::IO object (aka it will prefix all names with the new pwd).

Of course the ADIOS2 "name" of a variable and attribute is more general, yet when used with sane "/" separators we should also expose objects that can reflect fine grained access and listing (with changed pwd if you will).

cc @C0nsultant who wrote a python wrapper for ADIOS1 that makes it look like h5py in terms of group handling (can you link it?) and @franzpoeschel who is working around this missing abstraction

sklasky commented 5 years ago

Hi Axel One of the things with ADIOS is that it’s open source and it would be great to have someone in Dresden who could implement this in ADIOS 2 so the community would have this. Do you see this as much work. We could provide some assistance to who ever would do this. I like what you are saying but since we have some critical deadlines associated with different ECP APPS we need more assistance in these valuable requests.

Thanks

Scott


From: Axel Huebl notifications@github.com Date: March 12, 2019 at 9:02:32 AM GMT+1 To: ornladios/ADIOS2 ADIOS2@noreply.github.com Cc: Klasky, Scott A. klasky@ornl.gov, Mention mention@noreply.github.com Subject: [ornladios/ADIOS2] Feel "More Like XML" (#1281)

Hi,

one feedback we gave @pnorberthttps://github.com/pnorbert and @sklaskyhttps://github.com/sklasky in our last weeks workshop was about the "physiological feeling" that developers and users have when interacting with ADIOS files compared to what they know (XML or HDF5).

As a matter of personal opinion, HDF5 is still around not because of its performance or painful 80s C API but because people can quickly open and browse it like a filesystem and definitely because of h5py. (For the first part we added ADIOS1 to HDFCompasshttps://github.com/HDFGroup/hdf-compass in the past.) The new ADIOS2 API looks amazing and is super clean, yet as in ADIOS1 one thing is a bit undervalued - hierarchies hidden in "names" (paths).

I am aware that "directory hierarchies" in ADIOS1 and 2 are internally just flat maps of names, which is fine, yet a users doesn't need to know that in the frontend. What we should add to make .bp file interaction for developers feel like browsing a filesystem is:

For example in h5py there is a specific Group class that can be interacted with just as an intermediate object to access deeper Groups, Attributes and Variables without specifying the full path. Such a group basically just changes what is viewed as "root" / and otherwise would be similar to adios2::IO. It would be important to allow "listings" on such a view that are both recursive and non-recursive from the changed "present working dir" in the ADIOS-name (pwd).

Such an intermediate object in the APIs would also be really nice for ADIOS2. It's purely cosmetical, also it's purely optional, yet makes a world of a difference for the developer that expects some kind of "filesystem/XML hierarchy" when writing or reading complex data. With many variables and attributes, we just want to imagine ourselves to put them in little boxes (directories) and we want to create and annotate such directories with attributes even if we do not yet put variables in them.

If I understood the new APIs correctly, we probably just need an adios::Group object that is a filtered "view" of an adios::IO object.

Of course the ADIOS2 "name" of a variable and attribute is more general, yet when used with sane "/" separators we should also expose objects that can reflect fine grained access and listing (with changed pwd if you will).

cc @C0nsultanthttps://github.com/C0nsultant who wrote a python wrapper for ADIOS1 that makes it look like h5py in terms of group handling (can you link it?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ornladios/ADIOS2/issues/1281, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMz8B-9NH-58vA_6CZ0l3W73oAPIe0_Pks5vV18EgaJpZM4bqR3f.

ax3l commented 5 years ago

@sklasky thank you for the feedback!

Yes, please see the issues I open as a documentation of ideas. Just so they do not get lost since I am currently involved in writing my thesis and might forget them.

If you agree that the concept of "name groups/paths" fits conceptually into ADIOS2's API as described above, this would definitely be something we can work on together. For more intuitive interaction with hierarchies in names , it's probably just a derived adios::IO helper object with filtered scope. For writing "empty groups", it's likely a metadata addition.

ax3l commented 5 years ago

@williamfgc do you think it's possible to derive an adios2::IO object in the described way above to create a adios2::Group object with narrowed scope? Would love your opinion on this.

williamfgc commented 5 years ago

@ax3l are you asking for something like this:

   std::map<std::string, std::map<std::string,std::string> > process_vars_info = io.AvailableVariables("root::process::*"); 

?

Notice how I use :: as in C++ namespaces. We don't want to enforce a single / symbol for hierarchy.

ax3l commented 5 years ago

Similar, yes. Actually, my issue with many variables and attributes in files is the following:

io.AvailableVariables() (and attributes) is similar to ls -R / on a filesystem.

In most cases when reading ADIOS files, I just want to cd /home/axel and then ls ./* instead of ls -R. Wildcards as in your example would actually be wonderful as well.

Of course we could ask the user for an arbitrary separator.

The added benefit of a lightweight adios2::Group object would be, that I can use the same cd-like syntax for writing. (We write groups of variables in similar prefixes and quite a bunch of attributes per variable).

williamfgc commented 5 years ago

Of course we could ask the user for an arbitrary separator.

My personal experience with data hierarchies is that every group/customer has their own ideal of what their data representation should be. ADIOS2 is the library under these higher-level constructs. We can provide basic functionality, but we don't want to restrict users to the Unix filename style '/'. We are not trying to be a hierarchical database, but the engine underneath any style.

That being said, about the adios2::Group object (or any lightweight object that wraps around a C++ container...which is different from a typename/using alias) is that we add to the learning curve of the ADIOS2 API when a C++ container would do the job just fine. Keep in mind that AvailableVariables returns a std::map and with that all of its benefits (key sorted, log(N) search, you manipulate it as your workflow requires with erase, iterators, etc.). Ultimately, we are giving you a standard b-tree already, adios2::Group would just be a thin layer.

I wouldn't mind adding the wildcard or regex style to AvailableVariables, you've shown it's worth it.

williamfgc commented 5 years ago

Similarly, the above applies to Python dictionaries, BTW.

ax3l commented 5 years ago

Fully agree, we should make this flexible.

The question is if ADIOS/BP is a hierarchical file format or not. If it shall be/appear as one, let us mainline some fundamental helpers to iterate the user incentive of a hierarchy. Since it's fully optional, it will not add to the initial learning curve: it's just a new valid return type in case a non-complete "path" is passed.

// adios2::Engine reader
// adios2::IO io

io.AvailableVariables();
// /usr/bin/bash /lib/libadios /opt/cuda-10/bin/nvcc /home/axel/mails
// /some/prefix/a /some/prefix/b /some/prefix/c/d/e

auto newScope = io("/some/prefix", separator="/");
newScope.AvailableVariables();
// a b c/d/e
newScope.AvailableVariables(adios2::here, separator="/");
// a b

newScope.AvailableGroups(adios2::here, separator="/");
// c

auto oneMore = newScope("c", separator="/");
oneMore.AvailableGroups();
// d/e

// allocate and define: data, start, count
adios2::Variable<double> variable = oneMore.InquireVariable<double>("d/e");
variable.SetSelection({start, count});
reader.Get(variable, data);
williamfgc commented 5 years ago

The question is if ADIOS/BP is a hierarchical file format or not

That's for the user/application/consumer-of-adios2 to decide. We provide a flat hierarchy that can be interpreted in an infinite number of ways (as it is today in science anyways). The separator was a later addition, my personal opinion is that those things belong to the application. Should we make it easier for them to enable their data hierarchy? Absolutely.

If you look at your code, those are things that be easily done outside the adios2 library. It's a matter of how much external workflow code should go into the library (and these will only serve a few users, not all, but we don't want to alienate users by picking a single style).

williamfgc commented 5 years ago

This is why I am bullish on the regex/wildcard, it's of general use. Other stuff is too Unix centric, and not everyone (even C++ namespaces doesn't) follow this hierarchy.

ax3l commented 5 years ago

I think flexible wildcards would help a lot building arbitrary representations and I appreciate the flexibility in naming.

Nevertheless, I think stacked tree hierarchies are so common - everyone does it every day on their personal home directory - that we should provide such an optional "group" class in the mainline. As shown above, it can be fully ignored, just like we also have a simple API. I think this makes for a great gateway for users that switch from other formats and is super helpful in reads.

williamfgc commented 5 years ago

tree hierarchies, data frames, SQL style, node graphs, mesh formats...they can be built on top of the library and we see they are heavily used as we diversify apps usage. I wouldn't mind adding features that serve all of the above, but a dedicated one in particular can give the wrong impression and would add a tremendous cost to the library learning curve.

sklasky commented 5 years ago

I know we just support several applications, and one of the applications is the WDMApp. We might have a movement to use OpenPMD, and Axel and Michael are two of the creators. This comes from email exchange with Jean Luc and Amitava. So I need to understand if openpmd will need this . I should also find out from chuck and Berk if the schema for viz will require this functionality as well.

If they do then I need to understand a work estimate so we can figure out who and when will create this .

Thanks

Scott


From: William F Godoy notifications@github.com Date: March 12, 2019 at 4:19:03 PM GMT+1 To: ornladios/ADIOS2 ADIOS2@noreply.github.com Cc: Klasky, Scott A. klasky@ornl.gov, Mention mention@noreply.github.com Subject: Re: [ornladios/ADIOS2] Names in API: Feel "More Like XML" (#1281)

tree hierarchies, data frames, SQL style, node graphs, mesh formats...they can be built on top of the library and we see they are heavily used as we diversify apps use. I wouldn't mind adding features that serve all of the above, but a dedicated one in particular can give the wrong impression and would add a tremendous cost to the library learning curve.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ornladios/ADIOS2/issues/1281#issuecomment-472041483, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMz8B5P4_VBKC-N0tz_QqsFjPfzYXf5Wks5vV8VagaJpZM4bqR3f.

ax3l commented 5 years ago

In openPMD we use as simple tree-like structure.

What do you think about this: I like @williamfgc approach via general wild-card based queries. Can we maybe "store" the result of such a query in a general helper object inside the ADIOS mainline which in turn can be queried again?

So instead of my suggestion above we could do a combination with the Williams's wildcards:

// adios2::Engine reader
// adios2::IO io

io.AvailableVariables();
// /usr/bin/bash /lib/libadios /opt/cuda-10/bin/nvcc /home/axel/mails
// /some/prefix/a /some/prefix/b /some/prefix/c/d/e

auto newScope = io("/some/prefix/*");
newScope.AvailableVariables();
// a b c/d/e

auto newLocalScope = io("/some/prefix/*!(/*)");
newLocalScope.AvailableVariables();
// a b

// newLocalScope.AvailableGroups();
// I can now emulate this via a union of available variables and attributes,
// narrowed down to `*/*` and cropped at the first `/`
// c

auto oneMore = newScope("c/*");
oneMore.AvailableGroups();
// d/e

// allocate and define: data, start, count
adios2::Variable<double> variable = oneMore.InquireVariable<double>("d/e");
variable.SetSelection({start, count});
reader.Get(variable, data);

With that, building all the described structures on the user side will be quite efficient.

I admit that intuitive integration in exploratory tools such as HDFcompass is easier if we can add some kind of optional schema about separators, e.g. an attribute that reflects this, but otherwise we will add this downstream, I see your wish for flexibility.

sklasky commented 5 years ago

Hi Axel, I think one critical thing for us is to make sure that we can fully support OpenPMD, since this is going to be part of an ECP activity. One thing I would like the guys at Kitware (Chuck, Berk) is to make sure that everything we need for the Viz Schema can be fully supported as well.

I would like to have a few motivating examples for this work.

I would also like Jason to chime in for what Radio Astronomy with CASACORE will need to help with this too.

Thanks,

Scott

From: Axel Huebl notifications@github.com Sent: Wednesday, March 13, 2019 4:37 AM To: ornladios/ADIOS2 ADIOS2@noreply.github.com Cc: Klasky, Scott A. klasky@ornl.gov; Mention mention@noreply.github.com Subject: Re: [ornladios/ADIOS2] Names in API: Feel "More Like XML" (#1281)

In openPMD we use as simple tree-like structure.

What do you think about this: I like @williamfgchttps://github.com/williamfgc approach via general wild-card based queries. Can we maybe "store" the result of such a query in a general helper object inside the ADIOS mainline which in turn can be queried again?

So instead of my suggestion above we could do a combination with the Williams's wildcards:

// adios2::Engine reader

// adios2::IO io

io.AvailableVariables();

// /usr/bin/bash /lib/libadios /opt/cuda-10/bin/nvcc /home/axel/mails

// /some/prefix/a /some/prefix/b /some/prefix/c/d/e

auto newScope = io("/some/prefix/*");

newScope.AvailableVariables();

// a b c/d/e

auto newLocalScope = io("/some/prefix/!(/)");

newLocalScope.AvailableVariables();

// a b

newLocalScope.AvailableGroups();

// c

auto oneMore = newScope("c/*");

oneMore.AvailableGroups();

// d/e

// allocate and define: data, start, count

adios2::Variable variable = oneMore.InquireVariable("d/e");

variable.SetSelection({start, count});

reader.Get(variable, data);

With that, building all the described structures on the user side will be quite efficient.

I admit that intuitive integration in exploratory tools such as HDFcompass is easier if we can add some kind of optional schema about separators, e.g. an attribute that reflects this, but otherwise we will add this downstream, I see your wish for flexibility.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ornladios/ADIOS2/issues/1281#issuecomment-472328228, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMz8B-zD7xJzX3TClCG27DXmtJxyTG20ks5vWLifgaJpZM4bqR3f.

ax3l commented 5 years ago

Thanks, this sounds great!

I will get back to writing now and can provide more examples in a bit more than a month.

For now, just imagine you get a .bp file with 10'000 variables and 50'000 attributes and you want to explore its content and post-process only a specific part of variables in it, depending on what you find. How can we make that experience joyful for all users that do not work with their computers by parsing ls -R / when looking for the latest presentation in their $HOME? How can we naturally describe and iterate user-defined hierarchies?

It's all just "soft skills" but it will help with the adoption if we provide basic functionality for such fundamental workflows, even if it's just in an adios2::helper:: scope.

sklasky commented 5 years ago

Hi Axel, I agree you should get back to writing! I am actually talking a lot to Lipeng about metadata management, with sizes along the side you mention…

Scott From: Axel Huebl notifications@github.com Sent: Wednesday, March 13, 2019 5:06 AM To: ornladios/ADIOS2 ADIOS2@noreply.github.com Cc: Klasky, Scott A. klasky@ornl.gov; Mention mention@noreply.github.com Subject: Re: [ornladios/ADIOS2] Names in API: Feel "More Like XML" (#1281)

Thanks, this sounds great!

I will get back to writing now and can provide more examples in a bit more than a month.

For now, just imagine you get a .bp file with 10'000 entries and you want to explore its content and post-process only a specific part of variables in it. How can we make that experience joyful for all users that do not work with their computes parsing ls -R / when looking for the latest presentation in their $HOME? How can we naturally describe and iterate user-defined hierarchies?

It's all just "soft skills" but it will help with the adoption if we provide basic functionality for such fundamental workflows, even if it's just in a adios2::helper::` scope.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ornladios/ADIOS2/issues/1281#issuecomment-472337196, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMz8BxmFnpkQnzBhK8oWa5fzB0gQft_Lks5vWL9ggaJpZM4bqR3f.

williamfgc commented 5 years ago

@ax3l let's not create extra objects when C++ is only doing the work for us....we already have too many in the official API, adds to our maintenance, tutorials, examples material, learning curve, etc. Ultimately, this is what we are talking about: https://stackoverflow.com/questions/17253690/finding-in-a-std-map-using-regex
@sklasky I'd be happy to work with @ax3l on this wildcard/regex .

In fact, to answer to @ax3l question bpls already have pattern support: https://stackoverflow.com/questions/17253690/finding-in-a-std-map-using-regex ....and bpls is a utility built on top of the library.