Add BigNumber support for hex/oct/bin literals and functions

ovk commented 4 years ago

This is related to #1968 which went into 7.3.0.

@clnhlzmn added hex/bin/oct functions and literals for regular number type (integer only), which is nice. However, if MathJS is configured to use BigNumber type by default, unfortunately none of this works. On CL Calc website (which is still on 7.2.0) I implemented this as an extension on top of MathJS, so it works with BigNumbers, including non-integers (see an example here). I think this is how it should be implemented in MathJS for BigNumber type.

clnhlzmn commented 4 years ago

Extending hex/oct/bin for use with BigNumber should be easy enough. I intentionally limited functionality to 32 bit signed integers in order to be able to represent the bit patterns of negative numbers as 2s complement integers. I.e. after #1968 hex(-1) returns '0xffffffff' where in your extension it returns '-0x1'. Similarly after #1968 0xffffffff is parsed as -1 where in yours it's 4294967295. I assumed the 2s complement representation is more useful. It has the downside that we have to have different parse/format functions for different word sizes and for that reason it's limited to 32 bit integers for the time being. I'm not sure how that would work with non-integer numbers.

ovk commented 4 years ago

I only meant that it should work with non-integer BigNumbers. You reasoning for signed 32 bit wrap-around semantics for Number type makes sense (although, I personally would prefer unsigned 32 bit semantics rather than signed, but this is only because I'm too used to C and C++ where signed integer overflow results in undefined behaviour).

In my opinion it is fine that, for example, hex(Number) function only works on integers and has 2s complement 32bit semantics (i.e. hex(-1) = 0xffffffff if MathJS configured so that -1 is of type Number). What I suggest, is that hex(BigNumber) should just convert whatever number was passed to it (regardless of size, or whether it's integer or not) to string. I can't think of another reasonable behaviour for BigNumbers.

As for the literals, I think similar approach makes sense. I.e. if MathJS configured so that number literals are of type Number then it makes sense that 0xffffffff would be interpreted as -1. But if it uses BigNumbers then same 0xffffffff would be of type BigNumber and it having value of -1 doesn't quite makes sense. So in such case I'd expect it to be interpreted as 4294967295 and then also allow decimal point, e.g. 0xab.c to represent 171.75.

clnhlzmn commented 4 years ago

My opinion here is that if you're working with binary/octal/hex representations of numbers then you probably care about word size too. In that sense the behavior of parsing/formatting these representations shouldn't change if you're using BigNumber or number.

I.e. if MathJS configured so that number literals are of type Number then it makes sense that 0xffffffff would be interpreted as -1. But if it uses BigNumbers then same 0xffffffff would be of type BigNumber and it having value of -1 doesn't quite makes sense.

I don't agree. The type number in mathjs isn't an integer any more than the type BigNumber so I don't think the behavior should change for that reason.

I do agree though, that the user should be able to choose the word size (up to 'unlimited') and whether they're signed or unsigned when working with these representations. I don't think that the behavior should have anything to do with BigNumber vs number other than if you want your word size to be larger than 53 bits you need to use BigNumber because number can only represent integers exactly up to 53 bits.

ovk commented 4 years ago

My opinion here is that if you're working with binary/octal/hex representations of numbers then you probably care about word size

Not necessary. MathJS is math library (as its description says), and concepts like signedness/size/negative numbers representation are mostly relevant in software development, not in math. Hence I'd expect it just to convert any real number to different base (2, 8, 18) as-is.

I agree that types (Number vs BigNumber) are orthogonal to base conversion semantics. Maybe the ideal way would be to provide some flexibility to user via something like extra parameter for conversion functions (e.g. hex(123, u32) where u32 means unsigned 32-bit integer), and extra suffix for literals (e.g. 0xffffffffi32 where i32 means signed 32-bit integer). Then everything without a suffix could be treated just like mathematical base conversion. But I suspect this is quite a bit of work to implement.

Besides, in BigNumber library (decimal.js) the precision is specified in terms of significant digits, so mapping it to word size could be not trivial (for example, if I configured MathJS to use BigNumbers with 20 significant digits, what would be word size?).

So my suggestion to keep existing semantics for Numbers and to treat BigNumbers like real numbers and just perform base conversion as-is was made mostly from practical standpoint, as it should be not too difficult to implement, and I still think it makes more sense than trying to apply integer restrictions and wrap-around semantics to BigNumbers. But I agree, this suggestions suffers from some inconsistency between Numbers and BigNumbers.

I wonder what @josdejong thinks about this.

clnhlzmn commented 4 years ago

MathJS is math library (as its description says), and concepts like signedness/size/negative numbers representation are mostly relevant in software development, not in math. Hence I'd expect it just to convert any real number to different base (2, 8, 18) as-is.

Makes sense.

Maybe the ideal way would be to provide some flexibility to user via something like extra parameter for conversion functions (e.g. hex(123, u32) where u32 means unsigned 32-bit integer), and extra suffix for literals (e.g. 0xffffffffi32 where i32 means signed 32-bit integer). Then everything without a suffix could be treated just like mathematical base conversion.

I think that is a reasonable solution.

But I suspect this is quite a bit of work to implement.

I don't think it would be too bad. I think it would certainly be easier than the alternative of adding types Int8, Uint16, etc, and extending all the functions to work on those types appropriately.

josdejong commented 4 years ago

Good discussion 👍

Some thoughts:

Support for decimals is a nice idea, that could be a nice extension, though it's not the highest prio for me.
I think it is important to make sure number/BigNumber have the same behavior. In mathjs, users can use those datatypes in a mixed way, it would be really confusing when hex(-1) returns something totally different when switching to BigNumber.
Because of using the 2s complement representation, we'll have to deal with configuring a number of bits, no matter whether we're using number or BigNumber. For number this must be max 32, for BigNumber it can be larger.
Being able to specify something like 0xffffffffi32 is a nice idea. I'm afraid though that this notation is too "alian" for most users. How about creating a helper function like fromHex('0xffffffff', { signed: true, bits: 32}) maybe? Similary, we could pass these options the functions like hex(123, { signed: false, bits: 32 }).
How do other math applications and calculators solve this issue? I guess we're not the first...

I think we can to introduce 2 configuration options to make everything fully customizable: bits (or wordSize or something) and signed (what would be a good name for this?). This can be configured globally, or passed as options to the functions like hex. It will be good to think through what would be the best defaults for those options are, for me { bits: 32, signed: true } makes sense because this works for both number/BigNumber.

ovk commented 4 years ago

As for point 4, it's actually fairly common notation. Languages like C and C++ has literal suffixes for a long time, and, for example, Rust uses the exact proposed syntax https://doc.rust-lang.org/rust-by-example/primitives/literals.html . Suffixes could of course be accompanied by fromHex function.

clnhlzmn commented 4 years ago

How about creating a helper function like fromHex('0xffffffff', { signed: true, bits: 32}) maybe? Similary, we could pass these options the functions like hex(123, { signed: false, bits: 32 }).

I think that is the best option for handling word size. I don't think word size should be a global config because I would like to be able to work with different sizes in the same environment.

josdejong commented 4 years ago

As for point 4, it's actually fairly common notation. Languages like C and C++ has literal suffixes for a long time, and, for example, Rust uses the exact proposed syntax https://doc.rust-lang.org/rust-by-example/primitives/literals.html . Suffixes could of course be accompanied by fromHex function.

Ah, I didn't know that. Thinking about it, it can't do harm, and it's basically a more compact syntax for a function like fromHex which would allow passing those parameters too.

I think that is the best option for handling word size. I don't think word size should be a global config because I would like to be able to work with different sizes in the same environment.

Interesting idea to see if we can prevent the need for global config at all, that would make this totally unambiguous! So that would mean that when I enter say 0xffffffff, it would be interpreted with the default that we select (like 32 bits, signed), and if I would like to interpret it differently, I must add a suffix like 0xffffffffu64.

We would lose some user convenience, but it could be a great starting point. If it turns out to be too cumbersome, we could always reconsider making the default word settings configurable, that would not be a breaking change.

clnhlzmn commented 4 years ago

I have started working on this. See #1996.

josdejong commented 4 years ago

Nice!! Will review it hopefully this weekend.

clnhlzmn commented 3 years ago

I think this can be closed. See #1996.

josdejong commented 3 years ago

Ah, you're right 👍

josdejong / mathjs

Add BigNumber support for hex/oct/bin literals and functions #1982