y-lohse / inkjs

A javascript port of inkle's ink scripting language.
http://www.inklestudios.com/ink/
MIT License
490 stars 101 forks source link

[Compiler] Limitations regarding JavaScript's number type #934

Open ephread opened 2 years ago

ephread commented 2 years ago

In #931 @smwhr introduced a new mechanism to fix a long standing issue that inkjs has with number divisions. I think it's a great idea that can be further refined and I want to discuss the options we have.

Background

Unlike C#, Javascript only have one floating-point number type that is used to represent both floats and integers. This means that 1.0 and 1 are the same thing and JavaScript cannot perform integer divisions.

The same rules apply to JSON, 1.0 can be serialised as 1 in order to optimise the file size, but the format makes no distinction between integers and floating-point numbers. The reference implementation circumvents this limitation by implementing its own JSON (de)serialiser. Integers are serialised without a fractional part (1) while floats use at least one significant digit (1.0).

Since inkjs uses the built-in JSON parser, it cannot import 1.0 as a floating-point value (because Number.isInteger(1.0) == true) and that causes some discrepancies in the runtime. For instance both { 7 / 3 } and { 7 / 3.0 } evaluate to 2, because all numbers are interpreted as integers by the runtimes. In the original implementation { 7 / 3 } would evaluate to 2 while { 7 / 3.0 } would evaluate to 2.33333333.

Since version 1.11.0, inkjs uses a custom JSON writer, which is a thin wrapper over the built-in serialiser (see #363 and #493 for more information) but we've been historically reluctant to implement our own JSON (de)serialisers. After all, JSON stands for JavaScript Object Notation. 🤷🏻‍♂️

Possible Solutions

Custom (de)serialiser

Implement a custom (de)serialiser that reads / writes integers and floats as different types.

This would require creating temporary structures to hold the values and retain the type information before they can be finally converted to IntValue or FloatValue.

This solution isn't satisfactory for the reasons stated above.

Float conversion

Wrap floating-point numbers in a structure that retains the type information before parsing the JSON content.

This is the approach used by @smwhr; floating-point numbers are detected before parsing the container hierarchy and converted to a string value of the form: "x.xf", with x being either a number or a digit. During the container creation, the value is converted to a FloatValue.

For more information, see here and there.

This solution has two minor pitfalls:

  1. It doesn't discrimate between actual floating-point numbers and what might appear to be. Quoting @smwhr:

    when reading the raw json, it looks for strings of the form ,123.0 so if you have that as plain text, it'll transformed in ,"123.0f" in the final code. I know I sometimes use text that looks like function calls left to be interpreted by the player, but I actually never used float in there.

  2. It requires tweaking the regular expression to match all patterns where a floating-point number might appear, such as evaluation streams or variable declarations.

As result, a feature toggle is required. Users need the ability to disable the float recognition routine if it ends up conflicting with the content of their story.

Bytecode changes

Tweak the bytecode format to encode floats and integers in a different, unambiguous format.

The compiler has access to the full context and could ensure that only valid numbers are encoded. Like the above solution, number types would be converted during the creation of the story.

Since the native JavaScript numbers are floating-point numbers, I would be inclined to encode integers instead of floats. The lines of text are already prefixed by a caret (to differentiate them from control commands) and integers could be prefixed with another symbol.

This approach has a major downside. It requires changing the bytecode and the save format in a way that makes them incompatible with upstream. To mitigate potential issues, we could introduce two compilation modes:

  1. a standard mode that would produce a JSON file matching what upstream generates (including the BOM at the beginning of the file);
  2. a javascript mode that would produce a JSON file only compatible with inkjs. The bytecode format would be identified by a custom property (same for the save states).

On the upside, however, it doesn't require setting a flag on the runtime. The bytecode itself defines the runtime mode.

The changes to the compiler and the runtime engine would be minimal and I if well documented, I don't believe they would introduce a huge maintenance burden.


Let me know what y'all think!

y-lohse commented 2 years ago

Thank you for the write-up! Initial thoughts, might totally change my mind later 😁

I'm afraid of the float conversion change. It would be alright behind a feature flag but I would say it should be off by default..? I don't know what's more likely, floating point errors or accidental casts. But I ca see people wanting both and being stuck :/

The bytecode change suggestion is very interesting as well. The bytecode generation mode is a little dangerous for the ink ecosystem I think, but probably acceptable. I would lean towards this solution. I do wonder however — is this maybe a change that could be integrated upstream @joethephish ? Hypothetically we could prototype the solution in inkjs before porting it upstream.

joethephish commented 2 years ago

I’d definitely be up for changing the “byte code” JSON format so that we can keep both perfectly aligned. It would be a breaking change of course, but it’s probably worth it in the long term.

I don’t really mind if we have a different format for integers or floats. I guess the options are ”42i” for explicit integers or ”42.0” for floats. The latter seems a bit more elegant for the format, though I guess the former makes more sense for deserialisation since integers are the ones that need to be handled explicitly in JS. So I’d probably do that!

smwhr commented 2 years ago

What I suggest is to only transform misleading (f%1==0) float in "12.0f" notation and make a small PR on the C# codebase to read and write them accordingly.
Indeed, non-rounded floats are already written and read correctly both in inkjs and inkc#

This would only break new generated code running on old engine if round float occur in the source code.
We obviously can't patch the old engines in the wild but I expect that people who create new ink files will use the new engine.

joethephish commented 2 years ago

Oh I see, that’s pretty elegant, I like that.

ephread commented 2 years ago

What I suggest is to only transform misleading (f%1==0) float in "12.0f" notation and make a small PR on the C# codebase to read and write them accordingly.

Wouldn't using using two different encodings at the same time, for the same type ("12.0f" vs. 12.3) become confusing?

smwhr commented 2 years ago

Confusing for whom ? People reading the raw bytecode ? That'd be just me 🤣

The argument in favour of switching to string notation for all floats is that it would break consistantly the old engine instead of possibly breaking it seemingly randomly after weeks when the author starts tweaking values and suddenly use a rounded float.

ephread commented 2 years ago

Confusing for whom ? People reading the raw bytecode ? That'd be just me 🤣

Haha, yeah, that wouldn't change anything for the end-user for sure! I'm only bringing this up because those sort of decisions aren't small and they can have unforeseen consequences in the future.

The argument in favour of switching to string notation for all floats is that it would break consistantly the old engine instead of possibly breaking it seemingly randomly after weeks when the author starts tweaking values and suddenly use a rounded float.

That's a good point. I can't remember, but wouldn't old engines complain anyway if we were to bump up the format version? I'm a bit out of my depth when it comes to the JSON bytecode, to be honest.

I'm glad @joethephish is up for changing the format in upstream, though, because that means we can figure out a solution that's suitable for inkle as well.

joethephish commented 2 years ago

Yeah we’d definitely have to bump the version anyway. So it probably makes sense to make the format the best version of itself, compromise-free as much as possible. Though it’s hardly a format that’s ever really meant to be read, except when debugging TBH, so it’s not a huge deal whichever one we choose. It’s always been ugly, always will be 😁

lambdadog commented 4 months ago

I'm looking at trying out Ink for a web-based project and noticed it's been some time since any work or discussion has occurred on this issue. Is there a current "best practice" for avoiding this problem, other than the string replacement trick noted above? And is there any potential timeline on fixes? I would be willing to contribute, but I've never worked with C# so I'd only easily be able to make contributions to inkjs specifically.

smwhr commented 4 months ago

@lambdadog Unless you have very specific casting needs and rely on very strict typing of numbers, you should be safe to use the current version.
No discussion has occurred mainly because no significant problem occurred in the past 2 years !

In case anything should happen to you in that regard, come here and we'll help for sure !