How to set Enumerations

modelica / fmi-standard

Specification of the Functional Mock-up Interface (FMI)

https://fmi-standard.org/

Other

269 stars 84 forks source link

How to set Enumerations #1288

Closed andreas-junghanns closed 3 years ago

andreas-junghanns commented 3 years ago

In FMI 2.0, we used fmi2SetInteger to set variables of type <Enumeration>. We now have to define which of the new SetIntXX functions is to be used.

In order to be most general, we propose: fmi3GetInt64.

andreas-junghanns commented 3 years ago

Moreover: We need to change the XML element attribute types to hold SInt64 values (long): min, max, start, value

pmai commented 3 years ago

I'm a bit confused here; at the time we introduced the new integer types, this issue was discussed, where one alternative would have been to allow enumerations for all base integer types (the most flexible option, mirroring C), or stick with the old definition. If I don't remember incorrectly, we decided against the full fledged variant, and decided to align the enumerations with fmi3Int32, more or less identical to FMI 2.0. This is what is currently implemented in the schema.

We can change this again of course, but rather than switching to 64bit, if we were to re-touch this part of the standard, I'd rather argue to go for the full flexibility variant. But on the whole I'd just suggest we stick to what we decided.

andreas-junghanns commented 3 years ago

Restriction to 32-bit is not allowing the growing use of bit-encodings even for enumerations using 64-bit these days. All we implemented in #1272 (please check) is allowing for 64-bit, but keeping it otherwise functionaly the same.

More importantly and independently from the bitness: We need to specify which of the Integer Set/Get functions to use. We now have in 2.2.6.2: "For variables of type Enumeration, fmi3SetInt64 and fmi3GetInt64 must be used".

pmai commented 3 years ago

All I'm saying is that this was decided at the time (the design meeting that the new integer types were introduced) to use Int32 and hence fmi3SetInt32. It was not an open issue. I have no problem with this change, it just seems to me that this is a change of an earlier decision. And that if we want to support more native-like encodings, then we should really make enums map to any of the integer types.

andreas-junghanns commented 3 years ago

I have tried to find out what C enum types are and they seem to be int (or unsigned int if no enum value is negative). The int type is rarely 64 bit, as far as I know. Even on 64 bit machines, most compilers still keep it at 32 bit.

Are you agreeing? If so, then I agree to revert to the previous decision.

The 64-bit use-cases I have in mind are those where bits of bit-fields (not enums) are attributed to labels in embedded code. A2Ls show them as labels and translating them to enums would be nice. But once they pass 32 bits (which, like with any reasonably limit, will be broken at some point), such a translation from A2L VTABs to FMI enums will not be possible anymore.

HansOlsson commented 2 years ago

I have tried to find out what C enum types are and they seem to be int (or unsigned int if no enum value is negative).

To avoid any doubt: The C-standard require that enum literals are int, 6.4.4.3 Enumeration constants

The C-standard does not require that values of enum-type are int, 6.7.2.2 Enumeration specifiers - but instead that it must be a signed/unsigned char/integer type capable of handling all of the enum-literals for that enum (that we just saw all are treated as int).

So different enum-types may use different integer types.

christoff-buerger commented 2 years ago

This is not a good decision. You are fighting against the C Standard by forcing enumerations to map to 64bit getters and setters. The C Standard is crystal clear on this:

6.4.4.3 Enumeration constants [..] An identifier declared as an enumeration constant has type int.

6.7.2.2 Enumeration specifiers [..] The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int. [..] The identifiers in an enumerator list are declared as constants that have type int and may appear wherever such are permitted.

Annex I (informative) Common warnings An implementation may generate warnings in many situations, none of which are specified as part of this International Standard. The following are a few of the more common situations. [..]

A value is given to an object of an enumerated type other than by assignment of an enumeration constant that is a member of that type, or an enumeration object that has the same type, or the value of a function that returns the same enumerated type (6.7.2.2).

Annex J (informative) Portability issues J.3.9 Structures, unions, enumerations, and bit-fields [..]

Whether a “plain” int bit-field is treated as a signed int bit-field or as an unsigned int bit-field (6.7.2, 6.7.2.1).

Allowable bit-field types other than _Bool, signed int, and unsigned int (6.7.2.1).

The integer type compatible with each enumerated type (6.7.2.2).

Hence, you are forcing to generate C code that hits undefined C behavior (it likely works with most compilers but no guarantees), which essentially calls for undefined behavior and tooling incompatibilities (like for example model exchange with sources compiled by an importing tooling in strict mode => compilation error, which is not so unlikely if the importer values quality).

christoff-buerger commented 2 years ago

Above quotes are from the C17 Standard.

pmai commented 2 years ago

Reluctantly putting on my old and worn (pre-dating ISO C) language lawyer hat:

The quotes from C17 are besides the point (beyond referencing a standard that FMI makes no reference to): The C standard neither normatively defines the underlying integer type of enumerated types (only how enumeration literals/constants are typed), nor the size of int in the first place (64bit implementations with 64bit-wide int - ILP64 - have been known to exist, e.g. Cray).

More importantly: Assigning values to enum variables that are not enumeration constants or variables of the same enumeration type (which Setters from the PoV of an FMU will always have to do) already leads to undefined behavior; so worrying about any more undefined behavior that occurs as part of this operation is unfounded.

FMUs should avoid enum types internally anyway, if they want to be fully conforming and fully portable. And if they do not, then they will just rely on implementation-defined and undefined behavior if they are implemented in C, like all other C software out there. People who value quality will not be using C.

Making enums typed int32_t or uint32_t does in no way improve the situation (it also does not make it worse), since neither of them are claimed compatible with int or enums by the standard. And picking int is out of the question because that makes the ABI compiler/platform-specific in undesirable ways.

In other words: Ignore the C standard language lawyering, pick a decision for FMI that suits FMI, and be done with it.

Personally I would have just made all integer types optionally enumerable (and more optionally even bit-combinable), obviating this discussion, but I was voted down at the time (and I am not complaining about this, because this would have added more complexity, so I can see the downsides).

Picking a 64bit type is consistent for FMI, given that we mandate 64bit integer types. Picking a 32bit type to help some common languages with broken FFIs and to be slightly more conservative is also fine by me, however a bit surprising since we swtiched to 64bits over a year ago.

Again the cleanest approach in my eyes would be to do away with a separate enum type, and just make it possible to declare enum constants for all integer types, and to flag whether only those constants can be used, any combinations thereof, or arbitrary integer values are fine.

Would also disabuse people of the notion that FMI enums and C enums are somehow related... But this is rather late in the game, so I do not think warranted at this point in time.

christoff-buerger commented 2 years ago

Reluctantly putting on my old and worn (pre-dating ISO C) language lawyer hat

Your hat sits lopsided :) The C standard, since C99 at least, states clearly:

An identifier declared as an enumeration constant has type int.

Type int; not is of integer type.

The C standard neither normatively defines the underlying integer type of enumerated types

Sure. But the underlying type is not the point here. The point is, that writing something like enum foo { e = [[ some value that needs 64bit ]] } is invalid C99/11/17 if int doesn't happen to be 64 bit. Hence, as soon as somebody starts to use the cool 64-bit support for enumerations of FMI, model exchange is very likely to break.

nor the size of int in the first place (64bit implementations with 64bit-wide int - ILP64 - have been known to exist, e.g. Cray).

Ok, good, you found the exception/exotic hardware. Let's talk about actually most likely used hardware like x86 64-bit.

More importantly: Assigning values to enum variables that are not enumeration constants or variables of the same enumeration type (which Setters from the PoV of an FMU will always have to do) already leads to undefined behavior; so worrying about any more undefined behavior that occurs as part of this operation is unfounded.

Adding more and more undefined behavior is not decreasing chances to hit issues. The point with this 64 bit getter and setter for enumerations proposal at hand is, that it breaks on the most common setups as soon as you start using values that can not be represented with int. As simple as that. This is not about something exotic, but something extremely likely to not work. At least all currently used desktop machines will break when you come with your 64 bit requiring enum constant in model exchange.

I am aware that the C standard's integer type tower is not a full definition, but in combination with the expected most applied hardware it gives a clear idea of "not in general defined, but sometimes/often worth to support" and "exotic thing you should really avoid". And this proposal falls into the latter category.

pmai commented 2 years ago

Your hat sits lopsided :) The C standard, since C99 at least, states clearly:

An identifier declared as an enumeration constant has type int.

This is section 6.4.4.3 Enumeration constants, and that applies to the enumeration constants, not to the enumeration type, as you already quoted. So it does not contradict anything I (or you) said, but rather repeats it.

The C standard neither normatively defines the underlying integer type of enumerated types

Sure. But the underlying type is not the point here. The point is, that writing something like enum foo { e = [[ some value that needs 64bit ]] } is invalid C99/11/17 if int doesn't happen to be 64 bit. Hence, as soon as somebody starts to use the cool 64-bit support for enumerations of FMI, model exchange is very likely to break.

So don't do that then. Nobody forces anyone to do that: The enumeration values that an FMU exposes are completely under the control of the FMU/FMU generator. If an FMU generator wants to expose internally defined C enums via FMI enumerations, they can safely do so, because it is a subset of the value space (they'll have to live with the non-conforming cast (or a conforming but tediuous switch statement), but as you point out that actually works in practice).

FMUs that are not so constrained can use the full FMI enum range and map that to whatever they want.

An FMU importer will never map FMI enums to C enumeration types (it would be difficult to do so without on-the-fly code generation, and even then there is no good reason to do so). So again the range of enumeration constants play no role.

nor the size of int in the first place (64bit implementations with 64bit-wide int - ILP64 - have been known to exist, e.g. Cray).

Ok, good, you found the exception/exotic hardware. Let's talk about actually most likely used hardware like x86 64-bit.

You can't have it both ways: First you argue about conformance to the C standard, disregarding actual practice and necessary non-conformance, then you argue actual implementation practice, disregarding conformance. Just stop doing this, is what I'm saying: Concentrate on the actual practical problem you have in mind, and discuss that. All the appeals to authority are a red herring.

The point with this 64 bit getter and setter for enumerations proposal at hand is, that it breaks on the most common setups as soon as you start using values that can not be represented with int. As simple as that.

Aside: Why not state your objection like the above directly, without all the red herrings about conformance?

To the point: I have yet to see a non-stupid implementation that would run into this problem: FMU generators are completely free from the problem, since they are fully in control. FMU importers that magically map FMU enums to C enums are the only ones that I would see affected by this, in terms of C. I think those are few and far between, and it is unclear whether they would actually map FMI enums to C enums, rather than something else.

I actually have more sympathy for other non-C importers, which might map to dynamic languages where on the fly code generation is more likely. E.g. for Modelica based importers this might be more problematic, and actually more common to auto-wrap FMUs into Modelica wrappers that use Modelica enumerations (which are actually arbitrary-size, but in practice many implementations seem to be 32bit-based).

christoff-buerger commented 2 years ago

What FMI is doing here is camouflaging portability issues of a standardized C API (what FMI boils down to) by wrapping enumeration-types in XML definitions that are close designed to C but not fitting -- claiming they have nothing to do with C -- which eventually have to be mapped the one way or the other to a C implementation (you can read C implementation to a target architecture specific representation -- because that is what C is, the van Neumann architecture); but sure, if that mapping fails it are tool issues and it has nothing to do with the FMI standard.

If the proposed FMI design is so independent of C and well-defined etc (i.e., the C standard is a red hering), then why do you want to get and set enumeration values at all with bit-size based integer getters and setters? Because that is very much C design; and for that reason I don't think referencing the C standard nor "All the appeals to authority are a red herring.", because the rational of the C standard applies also to FMI when you pick its design.

In fact, you are happily ignoring the big compatibility warnings the C standard gives for such design:

Annex I (informative) Common warnings An implementation may generate warnings in many situations, none of which are specified as part of this International Standard. The following are a few of the more common situations. [..]

A value is given to an object of an enumerated type other than by assignment of an enumeration constant that is a member of that type, or an enumeration object that has the same type, or the value of a function that returns the same enumerated type (6.7.2.2).

Annex J (informative) Portability issues J.3.9 Structures, unions, enumerations, and bit-fields [..]

The integer type compatible with each enumerated type (6.7.2.2).

And this problems hit the proposed FMI design, because you want to get and set enumeration values via integer getters and setters (hence, you dongle integer values and enumeration values like C does and you therefore suffer from the same issues). To call all of this a red hering is saying: the C standardization committe are morons, they could have easily achieved a well-defined portable language for this enumeration <-> integer tangled design but just failed; but we, we can make this work.

No, FMI won't fix what the C standard is warning about. You just shuffle the trouble to the tools. And on top of that make it worse, by picking a design that deliberately differs from the C standard's requirement that enum values can be represented as int for most C standard conform compilers/platforms and portable C code guidelines (picking 64 bit for enumeration value representation is exotic, absolutely exotic).

If you want a C independent design, you have to drop the idea of tangling enumeration and integer types like getting and setting enumerations via integer getters and setters and provide some orthogonal design. As it is right now, a 64 bit requiring enumeration value won't work, because the underlaying C API will break as soon as such is represented as a C enum on any non exotic ISA (e.g., x86 64 bit ISA), forcing all kind of complications on tool vendors to figure out how to make all of this work. Or just not use C enums ever and invent their own enum representations, hoping that they never encounter model exchange source code for a 64 bit int platform they are now building on a 32 bit int platform with unfitting enum values.

May I ask what the motivation for 64 bit enums is at all? Did we encounter models with more than 2^32-1 enumeration constants or where we need a enumeration constant with a value that doesn't fit in 32 bit int (and here I like already to argue that the FMI design group has to make up their mind what they want to use enums for: logic values/control-flow-decisions or numeric values/mathematical-computations)? Why this push for 64 bit enumeration values which increases chances of portability problems?

pmai commented 2 years ago

Nowhere in the FMI standard is an FMI enumeration mapped to a C enum type, nor is this suggested anywhere (if anything the FMI enumerations were inspired by Modelica, and in FMI 1.0 could not even express all C enum values). In FMI 2.0 enumerations used the fmi2Set/GetInteger setter, hence fmi2Integer, which might or might not have been defined as int, long, long long, etc.: FMI 2.0 left this to the platform definition:

These are the basic data types used in the interfaces of the C functions. More data types might be included in future versions of the interface. In order to keep flexibility, especially for embedded systems or for high performance computers, the exact data types or the word length of a number are not standardized. Instead, the precise definition (in other words, the header file “fmi2TypesPlatform.h”) is provided by the environment where the FMU shall be used.

Since this was obviously mostly unworkable, especially for binary FMUs, everyone kept to the default implementation, which restricted both fmi2Integer and the enumerations to int, and hence at most 32bits on most platforms.

FMI 3.0 uses C99 7.18.1.1 Exact-width integer types types to make the API/ABI fixed and independent from the compiler-/platform-specific sizes.

Hence the standard now also provides Int64/UInt64, besides the 32, 16, and 8bit variants. The choice to make enumerations match the biggest of those, as was previously also the case naturally follows. Again I would have actually made that configurable (i.e. enumerations are mapped to a specified type), but that was not the decision taken, and absent that making it the largest of the types that are mandatorily part of FMI 3.0 is a natural choice with the fewest restrictions.

None of this is at all related to C enumerations. Hence portability concerns for C source code using C enumerations do not play a role here, and hence nearly all the C language lawyering is beside the point: FMI enumerations were always pure integer types, and mapped as such in the C API.

So this leaves the only real question: Does the current choice pose a problem for C-implemented FMUs or importers. As we have already established, unless an implementation wants to map FMI enumerations to C enums, rather than the obvious integer types that the enumerations are defined as, there is no problem.

This leaves the case of mapping FMI enumerations to C enums: For FMU implementations there is no problem, since the FMU implementation controls the value space for the enumerations: It can cast its enums from/to the necessary integer type regardless of its size.

So we get to FMU importers that want to map FMI enumerations to C enums; Unless those implementations are restricted to known FMUs, this necessitates on-the-fly C code generation, whereas the common implementation to not map to C enums does not. Speaking as a member of a company that produces tools that do on-the-fly code generation for simulators, we still do not map to C enums, since there is no benefit in doing so.

So I am hard-pressed to find a reasonable case where the current definitions pose actual problems for C-based implementations. I am not saying that they can't exist, but nothing has been presented that would lead me to believe otherwise.

Now for other languages, that more practically rely on code-generation or reflection-based object generation, that have a need to map FMI enumerations to internal enumerations that are somehow constrained, I can more readily see a problem. Modelica still comes to mind here, maybe others (e.g. Javascript might have problems regardless, due to their broken numeric tower).

But if that is the argument to be made, make it. It might convince people to change the definition to something more restrictive. This however has nothing to do with the C standard, language lawyering or the infinite wisdom that bequeathed us the ISO C standard family that keeps research compiler optimizers and zero-day exploit writers employed.

christoff-buerger commented 2 years ago

Nowhere in the FMI standard is an FMI enumeration mapped to a C enum type

FMI 3.0 uses C99 7.18.1.1 Exact-width integer types types to make the API/ABI fixed and independent from the compiler-/platform-specific sizes.

FMI enumerations were always pure integer types, and mapped as such in the C API.

These three points are just a contradiction in my opinion!

All my complaints -- and the reason I link to the C standard -- are because of the universal language-design consequences of linking enumerations to integer values. Universal means, that every language with such linking suffers from the issues I quoted from the C standard. It is not just C, I can quote you the warnings and statements of other standards -- that support to give enumeration constants specific integer values -- as well.

But when I read...

This however has nothing to do with the C standard, language lawyering or the infinite wisdom that bequeathed us the ISO C standard family that keeps research compiler optimizers and zero-day exploit writers employed. People who value quality will not be using C.

...I get the impression this is a waste of time.

Hence, let's discuss the whole issue from the user perspective. I like to ask again: To what end is such design -- enumerations are 64 bit integers -- needed in FMI? As I said:

May I ask what the motivation for 64 bit enums is at all? Did we encounter models with more than 2^32-1 enumeration constants? Did we ever have any model/simulation with a need for enumeration constant with a value that doesn't fit in 32 bit int?

Why do FMI enumerations have to have a specific integer value? Could the FMI group please make up their mind if they want to use enumerations for logic values used in boolean conditions -- which are then platform independent and safe to use but do not have a specific integer value -- or for enumerations with specific numeric values that can be used for arithmetic-application -- which are then not platform independent because somehow this values must be represented in an arithmetics suited type which always depends on the target platforms supported ALU / ISA?

And when we start to consider tool vendors that have to implement all of this as users of the FMI standard as well

If you really want the latter -- enums with specific integer values -- can you please reduce the risk of incompatibilities, tool vendor challenges etc by using 32 bit enumeration values as common on most target platforms for FMU simulation?

You are on the right track with

Now for other languages, that more practically rely on code-generation or reflection-based object generation, that have a need to map FMI enumerations to internal enumerations that are somehow constrained, I can more readily see a problem. Modelica still comes to mind here, maybe others (e.g. Javascript might have problems regardless, due to their broken numeric tower).

A Modelica tool that maps the Modelica Integer type to C int -- as nearly all do, or more precise, all to my knowledge -- is completely screwed with this design decision. The only way to handle the 64 bit enumeration value range is to represent the FMI enumeration values as Modelica String or some other super awkward design like Integer[2] vectors (integer is to small, and Real doesn't work either because only 32 bit integers can be bidirectionally mapped to 64 bit floating point precision values -- 64 bit integers do not fit double). Are you aware what such setup means for the solver and numerics tooling of a simulation tool, how much effort is needed to support such a crappy setup that users will hate because it is a performance hazard?

Just a question of understanding: Do you think, that Modelica tools are exotic to that end? Or do you think that above setup is likely in the simulation world? Which simulation tools come to your mind that support -- and use -- 64 bit integers? In the continuous physics simulation world, integers are anyway very seldom used because physics is seldom discrete.

Now, above is assuming that a tool as importer knows that it has to do something. There is the even more worrying case, that an FMU was generated for an exotic platform where C int is 64 bit. As an importer of a 32 bit int platform you now just see the sources. You have to figure out, that you can not just compile the C source code, but must patch it first if it has an enumeration with value > 2^32-1 represented by int in the source. Do you really consider it reasonable that FMU importers patch the C code they import?

Anyway, I believe that 64 bit enumeration values are an absolute rare case. To that end I have high sympathy if tool vendors just ignore the FMI standard's 64 bit decision and let tooling simply break -- instead of going through the hassle of checking, if there is a need to patch code, support such values in their numeric solvers etc -- in the very rare case of an enumeration value that does not fit in 32 bit.

So feel free to decide whatever you want, but don't expect developers to spend huge amounts of manpower to support it if it doesn't look like anything needed.