Open malinkallen opened 5 years ago
That appears to be exactly what the Java code is doing (returning the number of elements):
Which Javadoc are you referring to?
The comment right above the function says:
/**
* Returns the total size of a datatype - the number of buffer
* elements it represents.
* <p>Java binding of the MPI operation {@code MPI_TYPE_SIZE}.
* @return datatype size
* @throws MPIException Signals that an MPI exception of some sort has occurred.
*/
...which seems to contradict itself:
I'm inclined to go with the latter (i.e., it's supposed to be the Java binding for MPI_TYPE_SIZE), but I wanted to check if you were sourcing some other documentation.
Hmm.... Open MPI Java binding is based on mpiJava. mpiJava 1.2.5 already has the comment/code.
/**
* Returns the total size of a datatype - the number of buffer
* elements it represents.
* <p>
* <table>
* <tr><td><em> returns: </em></td><td> datatype size </tr>
* </table>
* <p>
* Java binding of the MPI operation <tt>MPI_TYPE_SIZE</tt>.
*/
public int Size() throws MPIException {
if(baseType == OBJECT || baseType == UNDEFINED)
return displacements.length;
else
return size() / baseSize ;
}
According to http://www.hpjava.org/reports/mpiJava-spec/mpiJava-spec/node32.html , it counts elements (in the MPI sense) by design. I don't know the intent.
int Datatype.Size()
returns: datatype size
Returns the total size of the type. Java binding of the MPI operation
MPI_TYPE_SIZE
. Size is defined as the total number of buffer elements incorporated by the data type, or equivalently as the length of the displacement sequence. Unlike other language bindings, the size is not measured in bytes.
Are we missing Datatype.Size()
, then? (i.e., that returns bytes, not number of elements)
Oh, I forgot to mention. Datatype.Size()
in mpiJava is renamed to Datatype.getSize()
in Open MPI Java bindings in e4e3e411fc52b7427f7b892fa40c2605224311d9. mpiJava and Open MPI Java bindings never has a method to return the number of bytes.
So what do you think we should do here? It would seem like a bad idea to change the behavior of an existing binding. It feels like we should
How does that sound?
I'll look other methods to confirm it is the solution consistent with the entire API set.
FWIW, Siegmar Gross reported a related issue at https://www.mail-archive.com/users@lists.open-mpi.org/msg32525.html
Bottom line, I do not know the rationale for this, and I am not comfortable with it.
For example, if we MPI_Type_create_resize()
, the size will be in bytes and no more in elements. createResized()
expects bytes and not elements, and as pointed by @kawashima-fj, there is no way to get the number of bytes of a given type ...
Java bindings are not part of the standard, and as far as I am concerned, I would not mind changing the current behavior from Open MPI 5 (it might break some codes, but it could also fix some other codes ...)
I downloaded openmpi-3.1.3.tar.gz from https://www.open-mpi.org/software/ompi/v3.1 Januari 23. After compiling the code, I found some documentation that I accessed via path-to-openmpi-installation/share/doc/openmpi/javadoc-openmpi/index.html. That's the documentation I'm referring to.
A binding to MPI_TYPE_GET_SIZE would be great, but I don't whether it's best to change the existing method or create a new one (and change the documentation of the existing one).
I assume the number of Java OMPI User’s is small, but I’m still quite hesitant about breaking existing apps - even if we think the behavior of the existing getData is wrong.
I looked at the Datatype
class. The history is the following.
char
, int
, float
, ... in Java). For example, the array_of_displacements
parameter of the Hindexed
method is measured in base types, not bytes. Therefore Size
and Extent
methods count base types, not bytes. These are explicitly explained in its API Specification, and the design is consistent.array_of_displacements
parameter of the createHindexed
method was changed to count bytes. But getSize
, getExtent
, getLb
, ... methods were not changed to count bytes. I think it is inconsistent.@jsquyres The commit e4e3e411fc52b7427f7b892fa40c2605224311d9 is yours. Do you remember why you changed createHvector
, createHindexed
, createStruct
, and createResized
but didn't change getSize
, getExtent
, getLb
, ...?
If we want to keep compatibility, I agree with @jsquyres's suggestion. However getSize
is not the only problematic one. We need to add new methods corresponding to the following existing methods.
getLb
getExtent
getTrueLb
getTrueExtent
getSize
@kawashima-fj thanks for digging into this ! Now I understant the rationale of the initial design :-)
In C or Fortran bindings of MPI, derived datatypes have two roles. One is to allow messages to contain mixed types (for example they allow an integer count followed by a sequence of real numbers to be passed in a single message). The other is to allow noncontiguous data to be transmitted. In mpiJava the first role is abandoned. Any derived type can only include elements of a single basic type.
My interpretation is we do not follow the initial design any more.
The introduction of the createStruct()
method evidences my claim since we now allow a Java derived datatype to mix several datatypes.
/**
* The most general type constructor.
* <p>Java binding of the MPI operation {@code MPI_TYPE_CREATE_STRUCT}.
* <p>The number of blocks is taken to be size of the {@code blockLengths}
* argument. The second and third arguments, {@code displacements},
* and {@code types}, should be the same size.
* @param blockLengths number of elements in each block
* @param displacements byte displacement of each block
* @param types type of elements in each block
* @return new datatype
* @throws MPIException Signals that an MPI exception of some sort has occurred.
*/
public static Datatype createStruct(int[] blockLengths,
int[] displacements, Datatype[] types)
throws MPIException
I can see two ways of moving forward (both have the potential to break current codes) with a consistent design
createStruct()
and other methods that could cause mixing different datatypes, and make sure we count in elements (and not in bytes) everywhere.createStruct()
and friends, and count in bytes (and not elements) everywhere.@siegmargross your feedback is more than welcome !
@kawashima-fj The e4e3e411fc52b7427f7b892fa40c2605224311d9 commit reflects the Java redesign from Oscar Vega-Gisbert and Jose Roman -- I was just the committer (this was back in SVN days, and was before we gave Oscar direct SVN commit privileges). Meaning: this was the entire 2nd generation of the Java bindings, and reflected significant redesign work by Oscar and Jose.
If we have a handful of functions that were missed in this redesign, it seems like we have an inconsistent interface. My $0.02 is that we should probably document these functions as inconsistent (e.g., remove confusing text about "this is the Java binding for MPI_TYPE_SIZE" (because it isn't), and make new methods that are consistent with what we need.
@jsquyres Thanks for your explanation. I totally agree.
More information:
I found a paper on Java bindings in Open MPI. It seems to say getSize()
and getExtent()
count bytes. If so, it's a bug of the implementation.
From "3.7. Derived datatypes":
When working with derived datatypes, it is often handy to query about the size of the datatype (number of bytes of the data in a message that would be created with this datatype), with
getSize()
, and also about extent information, withgetExtent()
(andgetLb()
for the lower bound).
As @siegmargross said, getExtent
and getLb
methods count in elements but lb
and extent
parameters of createResized
method counts in bytes. It's very strange.
Comm.allToAllw()
and Comm.iAllToAllw()
were introduced in 7548e342dd208dd0f9e4522e431d6a6dbe660f53 and their sDisps
and rDispls
parameters counts bytes.
I read the documents by Oscar and noted
MPJ
, the Open MPI Java bindings support Java struct
(with some restrictions/tweaks) as described in chapter 3.7
getExtent()
should return the extent in bytesgetExtent()
return the extent in bytes (and not in elements)Win
constructor expects size
in elements (and not in bytes) which now looks odd to meAll that being said, my conclusion is the intent of the Open MPI Java bindings is to "map" the C bindings, and we should hence always count in bytes and not in elements (a direct consequence is that the snippet in Figure 17 is wrong).
I understand that would be a major change compared to the current implementation, and it will break existing applications.
An other approach, which is more friendly to existing apps could be to "hack" the design, and states we always count in elements. If a derived datatype is made of more than one predefined datatypes, then the element is MPI_BYTE
. We would have to change the definition of getExtent()
and states it returns a number in elements (and not in bytes) - this is what the current implementation is doing by the way - and fix the example in Figure 10. We could introduce the new getBaseSize()
method that returns the number of bytes of the element used to build the datatype (I suggest this name because it maps directly to what is implemented under the hood, and there is no equivalent C binding). We would also have to fix createResized()
so the lb
and extent
parameters are in elements and not in bytes.
The other approach is IMHO wrong, but could be preferred until Open MPI 5 since it is less likely to break any existing app. From Open MPI 5, we could start doing the (IMHO) right thing and count in bytes everywhere the C bindings count in bytes - and that is equivalent to an ABI change that will break existing apps.
I'm not sure I fully understand this discussion, so I hope you don't mind a possibly stupid question. Let's assume that I define a data type in terms of MPI.DOUBLE:s
DataType type1 = Datatype.createHVector(count1, length1, stride1, MPI.DOUBLE);
and then define a new type in terms of type1:
DataType type2 = Datatype.createHVector(count2, length2, stride2, type1);
Right now, stride2 is the number of bytes between the start of each block, but what happens if you change it to the number of elements? Will stride2 then be the number of MPI.DOUBLEs between the start of each block, or the number of type1:s? I need to be able to specify the stride in terms of either bytes or MPI.DOUBLE:s because the stride of type2 is not always a multiple of the size of type1.
@malinkallen currently, stride
is in bytes.
If we decide to use elements instead, then MPI.DOUBLE
, type1
and type2
will have the same element which is MPI.DOUBLE
.
So if we decide to use elements, stride2
will be in MPI.DOUBLE
.
Another option would be just go to "all in" on both approaches:
Have both APIs have common prefixes -- e.g., getSizeBytes()
and getSizeElements()
, or somesuch.
I.e., they'd both be "all new" sets of APIs, but:
@jsquyres my understanding is we should always count in bytes (and we should have implemented that a long time ago when Java-like struct
were introduced by Oscar and Jose) .
I understand your concern about not breaking current apps - or at least the ones that do not hardcode predefined datatype sizes because the current implementation does not provide any (straightforward) way to retrieve this info and some subroutines such as the Java binding for MPI_Type_create_resized()
expects bytes and not elements.
That's why I'd rather temporarily add two new countInBytes()
and countInElements()
subroutines (default is to count in elements, and issue a warning the first time a binding that requires such a count is invoked), and then count in bytes from Open MPI 5 and remove at least countInElements()
from Open MPI 6.
I'm confused by your proposal.
Why are you talking about count
? The issue at hand has to do with size
, i.e., getSize()
.
Are you proposing something like:
getSize()
(and friends) to return bytesgetSizeInBytes()
and getSizeInElements()
methods (and similar for other affected methods, as identified by @kawashima-fj)getSizeInElements()
method (and similar)getSizeInBytes()
method (and similar)What I proposed is to keep the existing APIs but have their behavior determined by a global flag. For example, getSize()
returns size in bytes or elements depending on the global flag. Here is a possible roadmap
getSize()
returns elements by default (e.g. no change) unless the user explicitly called countInBytes()
getSize()
returns bytes by default (e.g. ABI change so to speak) unless the user explictly called countInElements()
getSize()
always returns bytes, and we remove countInElements()
(since end users had enough time to update their apps in Open MPI 5. Also remove countInBytes()
since it became a no-op.A global flag is not thread safe. Also, I think overloading the word "count" is dangerous.
What if we change getSize()
(and friends) to take a boolean (e.g., returnValueInBytes
), and it has a default value.
getSize()
(with no arguments) to continue to workFair enough, so let me elaborate/complement a bit more on my initial idea.
A first step could be to simply add a getBaseSize()
method to the Datatype
class (so end users can pass arguments in bytes to subroutines that require it such as createResized()
.
I see that as a band-aid since the current implementation/design is flawed (we should have moved to using bytes everywhere when struct
like were introduced long time ago), and fixing that will break existing apps.
To help user transition (read keep their currently working apps work again unmodified with Open MPI 5), we could have a global flag.
Such global flag can be set by a MCA parameter, or via an API call that can be invoked only once (before MPI_Init()
? ). The API could be setMetric(value)
and value
can either be ELEMENTS
or BYTES
. Or it could be setJavaBindingsSemantic(value)
and value
can either be MPJ
or ompi
Per the discussion, I am now thinking a MCA parameter and no API is likely the best path to move forward.
Adding an extra argument would require some extra efforts to both developers and users now and later since the extra argument is already planned to be removed, that is why I think a new getBaseSize()
method + a new MCA parameter is a better way to move forward.
This is my opinion and it is not a strong one.
@ggouaillardet @jsquyres Is this much work to try to get fixed for v5.0.0?
I'm using Open MPI 3.1.3 (installed from a source code tarball).
According to the Javadoc of
Datatype
,getSize
is the Java binding ofMPI_TYPE_SIZE
. However,getSize
returns number of elements buffered, whileMPI_TYPE_SIZE
returns the number of bytes occupied by the data type.