jni-rs / jni-sys

Apache License 2.0
55 stars 20 forks source link

Considering redefining `jboolean` as either an `bool` or our own `#[repr(u8)]` enum #19

Closed rib closed 1 year ago

rib commented 1 year ago

While looking at jni-rs/jni-rs#400 it was noted that there's a risk that code can easily cause undefined behaviour for the JVM by setting boolean array elements to values other than 0 or 1.

This isn't something that's specific to array elements though; anywhere that Rust returns a jboolean to Java it needs to ensure the byte value is either 0 or 1.

Since jboolean is currently simply an alias for u8 (pub type jboolean = u8;) then it's very easy to return arbitrary invalid boolean values to the JVM.

My initial thought was that we could represent a jboolean as:

#[repr(u8)]
enum jboolean {
    False=0,
    True=1
}

and it was then also noted by @argv-minus-one that this is essentially how the Rust Reference defines bool: https://doc.rust-lang.org/reference/types/boolean.html

My initial reservation with simply defining jboolean as an alias for bool is that Rust doesn't generally guarantee much in terms of stable ABI and I wouldn't like to potentially depending on an implementation detail. e.g. the Rust Reference says "this book is not normative. It may include details that are specific to rustc itself, and should not be taken as a specification for the Rust language."

It looks like historically bool was assumed to be defined in terms of the C _Bool type (and several FFI projects have made that assumption) which wouldn't technically guarantee that bool were one byte on all platforms, ref: https://internals.rust-lang.org/t/rusts-stability-story-should-apply-to-bool-in-ffi/6305/8

Considering the follow up discussion here: https://github.com/rust-lang/rust/pull/46176 it looks like the consensus was that bool is defined to be the same as C's _Bool but it was also noted that that means it's one byte one all platforms supported by Rust.

A bool is also explicitly documented as being one byte here: https://doc.rust-lang.org/std/mem/fn.size_of.html

I think it's fair to conclude that Rust's bool type is basically guaranteed to always be 1 byte and to also use the values 1 and 0 for true and false - which is what we want for jboolean

argv-minus-one commented 1 year ago

JNI jboolean is also defined as being exactly 8 bits in the JNI spec. So, the sizes are definitely the same.

C99 specPDF sections 6.2.5 and 6.3.1.2 imply that 0 and 1 are the only valid bit patterns for a C _Bool. Since Rust bool's set of valid bit patterns is the same as C _Bool, then they must also be 0 and 1. Therefore, it should be safe to reinterpret the bits of a Rust bool as a JNI jboolean.

The definition of jboolean in the JNI spec is accompanied by two constants, JNI_FALSE and JNI_TRUE, with the values 0 and 1 respectively. This implies that those are the only two valid bit patterns for a jboolean, but the JNI spec doesn't actually say that. I doubt a JVM actually will use a value other than 1 to mean “true” in a jboolean, though, because if it did, then C code containing a statement like if (x == JNI_TRUE) (which at least one programmer does) would behave incorrectly. Therefore, it's probably safe to reinterpret the bits of a JNI jboolean as a Rust bool, probably, maybe :sweat_smile:.

rib commented 1 year ago

I doubt a JVM actually will use a value other than 1 to mean “true”

hehe, I'm definitely not so sure about that :) I'd be willing to bet the opposite I think - that most will interpret anything except zero as true.

cpu instructions are often pretty good at telling you when a result was zero or not zero so I'd imagine it could be common to take advantage of that and implement true as "not 0" which should be perfectly valid for any JVM or a Rust compiler I expect. (because they are allowed to assume that booleans are only 0 or 1 and 1 also happens to be "not 0" which can be simpler to determine.)

So it seems like it would be a perfect opportunity for undefined behaviour in the sense that maybe some implementations really do only consider 1 as true but others consider "not 0" as true and then if you have code that breaks the rules and stores a 4 in a boolean it might behave differently for different implementations.

rib commented 1 year ago

For reference here, the jni-sys repo has now been moved to the jni-rs organization and we are now in a better position to make this change to the jni-sys crate. (See: https://github.com/jni-rs/jni-sys/issues/17)

rib commented 1 year ago

Before really looking to switch jboolean to be an alias for bool I have tried to double check in case there is some guarantee that a JNI/JVM implementation will treat any non-zero value as TRUE - or more importantly that there would be roundtrip guarantees that would mean other u8 values need to be preserved.

Some useful links:

I certainly haven't seen any hint of there being any kind of roundtrip guarantees for other values, more the opposite in fact since it looks like different JVM implementations will do their own 0/1 normalization of booleans at different times - which don't appear to be well specified.

Overall there is no strong consensus on whether values besides 0 and 1 can be stored in a jboolean and so my interpretation of that is simply that it is "undefined" behaviour to store values other than 0 or 1.

In practice, because JVMs have tried to account for how C treats non-zero values as truthy then other values are quite likely to be considered truthy and/or get normalized to 1 but that's not guaranteed.

It does still look to me like bool would be a preferable alias for jboolean instead of u8