kaitai-io / kaitai_struct

Kaitai Struct: declarative language to generate binary data parsers in C++ / C# / Go / Java / JavaScript / Lua / Nim / Perl / PHP / Python / Ruby
https://kaitai.io
4.03k stars 197 forks source link

C# Compiler: Nullable primitives for optional values #142

Closed LogicAndTrick closed 7 years ago

LogicAndTrick commented 7 years ago

See kaitai-io/kaitai_struct_webide#16

There's a small difference between nullable primitives in Java compared to C# that will need to be catered for:

1: int? is syntax sugar for Nullable<int> - the C# compiler doesn't have automatic boxing to convert between int and int?:

int? value = null;
value.HasValue; // false
value.Value; // InvalidOperationException: Nullable object must have a value.

Action<int> fn = x => { };
fn(value); // CS1503 Argument 1: cannot convert from 'int?' to 'int'

If you do an explicit cast then it behaves the same as Java would:

int? value1 = 1;
int v1 = (int) value1; // = 1

int? value2 = null;
int v2 = (int) value2; // Runtime exception: InvalidOperationException: Nullable object must have a value.

Action<int> fn = x => { };
fn((int) value1); // Ok
fn((int) value2); // Runtime exception

2: Nullables can be used in operations, but they coerce all their operands to nullables. Again, doing an explicit cast will change it to generate a runtime exception instead.

int? v = null;
var x = v * 2 + 1; // = (int?) null
var x2 = ((int) v) * 2 + 1; // Runtime exception: InvalidOperationException: Nullable object must have a value.

3: Only structs can be used as nullables - this is basically the same as the Java compiler needing to know the difference between boxed and unboxed types.

int? i; // Ok
bool? b; // Ok

string? s; // CS0453 The type 'string' must be a non-nullable value type in order to use it as parameter 'T' in the generic type or method 'Nullable<T>'
object? o; // CS0453 

There's a few different ways to implement it, example:

# seq
- id: intval
  type: s4
- id: nulval
  type: s4
  if: intval > 1
# instance
- id: instval
  value: intval + nulval

# Question: is `instval` nullable or not?

Option 1: Same behaviour as C# compiler (difficult)

int Intval { get; set; }
int? Nulval { get; set; }

// Automatically determine the type of instval based on the expression operators
// I don't think this behaviour would work on most other languages though
int? Instval {
    get {
        return Intval + Nulval;
    };
}

Option 2: Explicit casting only for nullable types

int Intval { get; set; }
int? Nulval { get; set; }

// Automatically cast nullables
int Instval {
    get {
        // Expression translator will need to know if something is nullable so it can cast it
        return Intval + (int) Nulval;
    };
}

Option 3: Just cast everything and don't worry about it (easy)

int Intval { get; set; }
int? Nulval { get; set; }

// Cast everything
int Instval {
    get {
        // Even though intval doesn't need to be casted, just do it anyway because we're lazy
        return ((int) Intval) + ((int) Nulval);
    };
}

Any thoughts? The easiest implementation is option 3, but it will result in slightly messier generated code. There's no performance difference in the compiled result because the compiler will just ignore any unnecessary casts.

GreyCat commented 7 years ago

The main problem here is that this conversion varies a lot in different languages. For example, in C#:

int? a = null;
int b = 42;
var c = a + b // => null

In other languages, behavior varies. In some, this yields a runtime exception. For example, in Python:

a = None
b = 42
a + b # TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
b + a # TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

In Ruby:

a = nil
b = 42
a + b # NoMethodError: undefined method `+' for nil:NilClass
b + a # TypeError: nil can't be coerced into Fixnum

In JavaScript, though, null is silently converted to 0:

a = null
b = 42
a + b # => 42
b + a # => 42

And in Perl as well:

$a = undefined;
$b = 42;
$a + $b # => 42
$b + $a # => 42

Given all that stuff, I believe that the simplest thing that we can do here is just say that it is "undefined behavior" is KS expression language, so if you're doing value: intval + nulval without any checks beforehand, that's generally your problem.

But even if we do that, it doesn't solve the problem with lack of autoboxing in C#, as you've mentioned. If we just go with "undefined behavior" thing, there's actually no problem with simple binary ops, but there would be some problems with method signatures, etc. Could you give any examples when this would be relevant for KS?

The only one I could come up with is .to_i(x) method, i.e. something like that:

seq:
  - id: a
    type: strz
  - id: b
    type: u1
    if: a.length > 1
instances:
  c:
    value: a.to_i(b)

Currently, it is compiled into:

_a = System.Text.Encoding.GetEncoding("UTF-8").GetString(m_io.ReadBytesTerm(0, false, true, true));
if (A.Length > 1) {
    _b = m_io.ReadU1();
}
// ...
private string _a;
private byte _b;
private int _c;

And c instance is calculated like that:

_c = (int) (Convert.ToInt64(A, B));

This will probably break, if we'll declare _b as byte? instead of byte. However, probably we can just modify such function calls to add explicit typecast, as you've proposed in option (3)?

GreyCat commented 7 years ago

The majority of nullable problems in C# seems to be solved with recent commits. if_instances, if_struct, and if_values tests now seem to pass properly. CSharpCompiler now happily generates int?, byte?, etc, when needed to do so. Results should be probably available soon at CI, and, if that suits us, I'd think of closing this task.

GreyCat commented 7 years ago

Well, given that almost a month passed and nobody complained, I guess it's completed.