mpi-forum / mpi-issues

Tickets for the MPI Forum
http://www.mpi-forum.org/
66 stars 7 forks source link

Binding tool reloaded (usage of apis.json in any form -> look here) #431

Open martinruefenacht opened 3 years ago

martinruefenacht commented 3 years ago

Problem

The binding tool was written purely as a mechanism for the embiggening, but turns out it has plenty of other uses. The tool is currently not as well designed as it could be both for usability and extensibility.

Others are already using the intermediate apis.json to generate code for implementations (@wesbland, @jsquyres, @raffenet, @hzhou).

Proposal

Ideally we can come to an agreement about what everyone wants to be able to do with the tool, ie requirements.

This is a list of ideas that I would like to achieve, others please chime in as well.

I will be editing the above list as the discussion continues.

Changes to the Text

None. And hopefully none to the python bindings.

Impact on Implementations

None.

Impact on Users

None.

References

None.

hzhou commented 3 years ago

On top of my head, what is missing with the current tool:

besnardjb commented 3 years ago

To feed the discussion, here is a raw snapshot of a few scripts we have been playing with internally, consuming apis.json for various purposes: https://github.com/besnardjb/mpi_meta These were made to provide such prototype oriented API.

Note I've not included apis.json (Is it even licensed???).

I second the observation on the kind map which forced me to use (steal) some code from the original tool (https://github.com/besnardjb/mpi_meta/blob/9dfc152819e9ff47f8d3e816dea236379a8a5dfb/mpiiface.py#L8) -- takedown notices accepted :wink:.

Up to now, we have made:

So, I'd say it is VERY useful. Thank you for the great work :+1:

jsquyres commented 3 years ago

FYI @ggouaillardet.

martinruefenacht commented 3 years ago

FYI @ggouaillardet.

Thanks! I couldn't find the username...

@besnardjb By prepass.dat do you mean the apis.json which is output by the binding_prepass.py?

besnardjb commented 3 years ago

Of course did not know it's name so named it after the script. My bad. (note: renamed it in my previous message for clarity).

hzhou commented 3 years ago

Since it is considering use cases outside the standard text, Is it better to make the python scripts a submodule to mpi-standard so its development and discussions don't have to take in the same space of mpi-standard? It will also make downstream easier to directly import the module, and potentially appeal to wider use cases.

martinruefenacht commented 3 years ago

There are multiple parts, the latex still contains the Python bindings (which are the official truth, not removable, kinds also) and then there is the extraction of those into the apis.json (prepass). From there the binding_emitter reads the apis.json when it is invoked during the building and uses the generators (lis, c, f08, f90) to encode the bindings in a Latex friendly form in whatever languages. The binding_emitter also does some additional transformation (indexes, underscore replacement)

I could write a python API which uses the apis.json from the document to give whatever output we all want, but it will have strong links back to the Standard especially once even more aspects of the Standard are pythonized (constants, semantics, type system).

The binding_emitter would then also use this submodule to then do its job. Internal to the Standard the binding-tool will always be a separate python package which really can live anywhere (at least I think that's a good idea). I don't know whether formally making it a separate module of some form is useful. The binding-tool would never be in charge of doing anything for the use cases except providing the information contained in the Standard in a method as simple and flexible as possible.

hzhou commented 3 years ago

Making it submodule does not necessarily make its tie to the standard any weaker, but separates the tool aspect and standard text aspect. Anyway, it is just a suggestion

martinruefenacht commented 3 years ago

Making it submodule does not necessarily make its tie to the standard any weaker, but separates the tool aspect and standard text aspect. Anyway, it is just a suggestion

I don't understand what you mean I think. In what sense is it not already a submodule now (pending some small cleaning)?

hzhou commented 3 years ago

Making it submodule does not necessarily make its tie to the standard any weaker, but separates the tool aspect and standard text aspect. Anyway, it is just a suggestion

I don't understand what you mean I think. In what sense is it not already a submodule now (pending some small cleaning)?

For example, if my project want to directly use the python library, do I have to import the mpi-standard repository?

martinruefenacht commented 3 years ago

In the current model of thinking, yes.

In the future, I can't really see how you couldn't since the Standard needs to contain the truths and not my Python code. Unless we get the MPI Forum to accept the apis.json as a valid thing to write into the latex at various markers (which I think will never happen, also puts a ton of responsibility on me).

There is the option of caching the apis.json in the submodule, but that has issues with updates to the Standard. And as you said you don't want to have to carry the apis.json around since it is large (we could make it a smaller diff from default, but still).

I understand what you would like (me too) and I think it would be elegant, but I don't know how to make it happen without additional leg work somewhere.

hzhou commented 3 years ago

A different question maybe -- do you think discussing the coding problems in the python scripts is appropriate for the forum's general community? Or do we want every commit to the python code to go through the same process of forum voting?

hzhou commented 3 years ago

I don't know how to make it happen without additional leg work somewhere.

I don't see why making the python code a submodule would require any leg work. Everything works exactly the same except a git submodule update --init, right?

dholmes-epcc-ed-ac-uk commented 3 years ago

My understanding is that the Python code as written is useless without the tex files that it processes as input. The Python code cannot live on its own; without a suitable input, it cannot produce a useful output.

OTOH, most changes the Python code affect the tex output that is used by the mpi-standard build process to generate the PDF of the MPI Standard itself. The Python code must be under the control of, and subject to the voting procedures of, the MPI Forum because of this critical-path role.

Thus, they are intimately tied to one another and a separation in the manner @hzhou suggests, although technically feasible and trivial, serves no useful purpose.

dholmes-epcc-ed-ac-uk commented 3 years ago

Perhaps the correct response to the comment from @hzhou regarding discussion of Python code changes in the wide-band channels for the general community is to form a "chapter committee" for the Python code. Such a body could operate with a level of autonomy commensurate with all the other chapter committees: it could decide whether a proposed change is editorial or requiring of a voting procedure, authorise editorial changes to be carried out by its members or their agents, and provide a consultancy service for all and sundry regarding the content/layout/strategy/etc of the Python code.

In that way, only conversations of particular noteworthiness would leak out into a more public setting.

hzhou commented 3 years ago

OTOH, most changes the Python code affect the tex output that is used by the mpi-standard build process to generate the PDF of the MPI Standard itself. The Python code must be under the control of, and subject to the voting procedures of, the MPI Forum because of this critical-path role.

Git maintains submodule by commit hash. So changes to the submodule (python code) won't affect the standard text without an explicit module update commit, which can and probably should be subject to the usual voting procedures. The voting will be just voting on the result text rendering, right? Do we really want to vote on a white space change or a spelling correction in the python code that doesn't affect the text rendering? To extend, we are discussing here on extending the python tools for usages outside the text rendering. That part of functionality will not have an effect on the text rendering. Do we want to vote on that part of code here as well?

dholmes-epcc-ed-ac-uk commented 3 years ago

The usage for a submodule as described here seems to have a very strong overlap with the WG git repo forks and the PR system. I prefer PRs because of the support for reviews and tracking in the Github GUI.

martinruefenacht commented 3 years ago

@VictorEijkhout This is the issue just for reference.

VictorEijkhout commented 3 years ago

For my own MPI (slash OpenMP/Petsc/Sycl) textbook I parse the apis.json file for compact C/F API descriptions.

If I had my druthers.... the parameter list would have more structure to it. For instance if there was an indication that there was a "logical" parameter for "buffer", which just happened to be realized as point/count/datatype, then I could easily generate the Python API description where this buffer is a single numpy object.

martinruefenacht commented 3 years ago

@VictorEijkhout I think approaching it from "buffer" in source Latex -> "pointer/count/datatype" in PDF would be difficult to bring past the Forum, since the written document is the truth. However, doing the reverse should be possible since we always use the same pattern. Internal to a future tool we could "detect" these structures and emit them in a API friendly way.

VictorEijkhout commented 3 years ago

[...] the written document is the truth.

Understood. In the mean time this particular desire has gone away, as the mpi4py maintainer is starting to incorporate python types in his definition, which makes it independently parseable to me.

I still need the json file for parsing the C/F APIs.

wrwilliams commented 3 years ago

The tools interfaces (PMPI/future QMPI) provide another set of use cases for the binding tool. Things that we've come up with so far in various discussions:

This suggests the following requirements for use in wrapper generation:

My rationale for why the functions and their pieces should be individual objects is that it makes it easier to vary the behavior of a wrapper generator based on properties of the function. They can probably be very simple objects that are views into the One True Binding Representation(tm), but it's really quite important to be able to say "point-to-point functions use this template, collective functions use that template" for this use case.

martinruefenacht commented 3 years ago

Thank you all who have responded so far. I am starting to work on this and will be posting updates (with links) here.

martinruefenacht commented 3 years ago

@hzhou In the linked PR the yaml export and size issue is addressed. The compression ratio with bz2 is 88x (1.8MB -> 24KB) so I don't think the diff with the defaults is worth while for the amount of effort it will take.

In respect to exporting the kind mapping. The apis.json should not be touched after the API is done. The API will provide access to everything in a pythonic fashion.

jsquyres commented 3 years ago

I 👍 one of @hzhou's requests:

  • a small api interface, such as --
    • api = load_api("apis.json")
    • api.get_c_prototype("mpi_send")
    • etc.

Meaning: it would be great if the users of this tool don't have to duplicate all the existing C / Fortran logic to go from the JSON to the final rendered language binding -- I think we all assume that rendering C / Fortran will be a common use case for this tool. I also assume that if users want to make bindings for a different language (e.g., Python, C++, Rust, ...), they would need to write their own logic that utilizes the specification data in the JSON.

One more suggestion for the "small API interface" would be access to iterators for all the MPI symbols, functions, and typedefs (so that you can just foreach ... over all the functions without having to query for each one). Extra bonus points will be awarded for ensuring that the iterators produce stable-ordered output for reproducible results.

Additionally, I think the output JSON should also include some meta data about the standard. At a bare minimum, it should include the metadata version number and the MPI standard version number. Perhaps even the git repo, branch, and git hash from which it was emitted. Other metadata might be useful, too (open to suggestions here...).

The current JSON looks like this:

{
    "mpi_abort": {
        "attributes": {
            "c_expressible": true,
            "callback": false,
            "capitalized": false,
...etc.

I suggest that the MPI functions should be enclosed in a top-level list attribute so that you can have other top-level attributes such as version information, MPI types, MPI callback functions, other MPI typedefs, other MPI symbols, C and Fortran type maps, ... ? I realize that not all of these entities are Pythonized in the LaTeX (yet!), so they can be topics for the future; my main point here is to have top-level attributes that can be extended/expanded over time and not lock us into top-level attributes that are the MPI function names.

Here's an example off top of my head -- perhaps it can be a decent starting point for the discussion:

{
    "metadata_version" : "1.0",
    "mpi_version" : {
        "standard" : "4.0",
        "git_repo" : "https://github.com/mpi-forum/mpi-standard/",
        "git_branch" : "mpi-4.x",
        "git_hash" : "123abc"
    },
    "mpi_functions" : [
        "mpi_abort": {
            "attributes": {
                "c_expressible": true,
                "callback": false,
                "capitalized": false,
...etc.
RolfRabenseifner commented 3 years ago

Questions:

martinruefenacht commented 3 years ago

@RolfRabenseifner We are overloading the term API and Python many times here. The topic of this issue is splitting the current binding-tool into two. 1) The information we have encoded in the mpi-binding blocks in the latex needs to be accessible for tools/generators/verifiers (this is what is referred to as the Python API to the Standard). 2) the binding-tool (latex generator) as it is at the moment will be rewritten to make use of the new Python API (both for demonstration and cleaner separation).

For assurance, the mpi-binding blocks currently in the Standard will remain the same (the user level domain specific language), but the actual information contained in the mpi-blocks will hopefully be much more approachable and usable.

An official specification of the Python procedures for a Python MPI implementation is also something I am interested in, but is further down the road. I don't know how much it should look like MPI4Py, because that is fairly C like. A discussion for another day.

martinruefenacht commented 3 years ago

To all, I wanted to get some early feedback on the Python API. I wrote a short tutorial here.

Please give me your thoughts on it so it can be modified before we start more general roll out. The interface theme is fairly stable, but the underlying implementation is still slightly mutating to make it more self consistent.

I have rewritten the binding_tool on top of the Python API (as it is at the moment). Both the source and the binding_tool source can be found here. And I added a bunch of testing so we can be sure that it is emitting the exact same latex. pytest --rundiff in the root directory with the appropriate environment variable exports can be used to test.

This does not yet include everything, for example versioning, or the suggestion to separate the source code from the MPI Standard repo. We will get to those soon. If anyone has anything against the ways I have named things also please bring that up, especially if it is not consistent with the Standard. I am working on a proper API reference html/pdf so it is easier to get an overview. In addition, I am still working on fulfilling all expressions for the Fortran procedures (_CPTR, PMPI_, _FTS, _f08, _f08ts).

martinruefenacht commented 2 years ago

Given yesterday's discussion on this in the MPI Forum I will be separating the Python API that I have so far come up with into its own python module hosted on pypi (eventually this might live in the mpiforum repos?). The question of how do we give access to the actual metadata (apis, ...) is still somewhat unclear. Do we want to package it with the pympistandard module and allow external loading with a fallback to the packaged one? Or do we require the user to get access to an appropriate metadata file in some other manner, hosted on the MPI Forum website as an artefact?

I will be aiming to get what I have working currently to pypi as soon as possible, so everyone here can experiment with it, as a v0.1. Once that is done I will work towards all the verification that is required to push this into the main line MPI Standard usage as a v1.0. Beyond that, further versions will contain all manner of additions to the API to support the inspection of the MPI Standard information.

I will also be pushing documentation to readthedocs so that we can all have a good place to look this stuff up. I will do this as soon as possible as well so I can modify the API at the v0.1 version instead of needing to use the API versioning I have from the beginning and causing unneeded fragmentation. Additions to the API will not cause API versions to change, but removals will. So we are safe when adding constants for example, but changing the way we access information entirely will change the API version.

Also this has already caused some confusion so by API version I mean the actual Python API, not the MPI Standard version (that will be a field in the information which is queryable in future).

wesbland commented 2 years ago

Given yesterday's discussion on this in the MPI Forum I will be separating the Python API that I have so far come up with into its own python module hosted on pypi (eventually this might live in the mpiforum repos?).

I'd definitely advocate for putting this in https://github.com/mpi-forum/python-api (or some better name).

The question of how do we give access to the actual metadata (apis, ...) is still somewhat unclear. Do we want to package it with the pympistandard module and allow external loading with a fallback to the packaged one? Or do we require the user to get access to an appropriate metadata file in some other manner, hosted on the MPI Forum website as an artefact?

My personal preference would be to package up the official versions of the API from the MPI Forum and bundle them with the package itself. When the user of the API loads up the module, they can say that they want MPI 4.0 (or 4.1, or whatever) and/or any other files if they want a non-standard version. IIRC, I believe your API will already handle being able to filter out certain functions, so implementations could leave out functions they don't implement yet (though they should probably having the symbols but return MPI_ERR_NOT_IMPLEMENTED or something). If implementations have their own non-standard functions, they should be able to provide that in addition to the pre-packaged versions (or if they want to create a draft version for functions that will be included in future standards).

I will be aiming to get what I have working currently to pypi as soon as possible, so everyone here can experiment with it, as a v0.1. Once that is done I will work towards all the verification that is required to push this into the main line MPI Standard usage as a v1.0. Beyond that, further versions will contain all manner of additions to the API to support the inspection of the MPI Standard information.

Again, I don't know how pypi works, but I think it would be best to have the code in the cannonical place before a "release" instead of having to move it later.

hzhou commented 2 years ago

I will be aiming to get what I have working currently to pypi as soon as possible, so everyone here can experiment with it, as a v0.1.

Will there be a Github repo? If there is, then we can import it as a submodule. Some users/developers may be reluctant to install an arbitrary package.

martinruefenacht commented 2 years ago

@wesbland If you make a mpi-forum/pympistandard repo I am happy to push there. Currently I have separated it from mpi-standard into its own personal (private) repo. Would the mpi-forum/pympistandard repo be public?

I am also happy to first do the thing internally on the mpi-forum and then distribute via pypi when we are happy with it. I do think it needs to be open-source so people can trust it though. What license should I put on it, that is a requirement for pypi.

In respect to packaging the metadata with it, either works, pypi allows packaging data with the actual module. Slightly more complex loading code, but not a big problem. Currently we just provide everything that is existing in MPI 4.0. We don't have APIs annotated with information of when they were introduced or deprecated. Removed makes it even more difficult. But the idea is eventually you would be able to select which MPI version you want to be looking at. This would probably be implemented like the current version: pympistandard.use(api_version=1, mpi_version="3.1", path="custom_api_path.json", additional=["additional_experimental_apis.json"])

@hzhou Yes, either mine current one or the mpi-forum/pympistandard.

wesbland commented 2 years ago

@wesbland If you make a mpi-forum/pympistandard repo I am happy to push there. Currently I have separated it from mpi-standard into its own personal (private) repo. Would the mpi-forum/pympistandard repo be public?

I am also happy to first do the thing internally on the mpi-forum and then distribute via pypi when we are happy with it. I do think it needs to be open-source so people can trust it though.

https://github.com/mpi-forum/py-mpi-standard

I've made it public because I prefer things here to be public by default. If someone disagrees, we can discuss and and change it. To seed the pot, I've made @martinruefenacht an admin on the GitHub team that controls write access to that repository.

What license should I put on it, that is a requirement for pypi.

I don't have a strong opinion here. Happy to hear thoughts from others. It might be worth having a quick discussion on this in today's forum meeting (or a future virtual meeting).

In respect to packaging the metadata with it, either works, pypi allows packaging data with the actual module. Slightly more complex loading code, but not a big problem. Currently we just provide everything that is existing in MPI 4.0. We don't have APIs annotated with information of when they were introduced or deprecated. Removed makes it even more difficult. But the idea is eventually you would be able to select which MPI version you want to be looking at. This would probably be implemented like the current version: pympistandard.use(api_version=1, mpi_version="3.1", path="custom_api_path.json", additional=["additional_experimental_apis.json"])

Just to be clear, I don't necessarily think you need to backdate all of the APIs (at least not right now). I just think that moving forward we should have a canonical version of the MPI 4.0 metadata that does change after it's produced (and the same for all future versions).