Closed ovk closed 3 years ago
Extending hex
/oct
/bin
for use with BigNumber
should be easy enough. I intentionally limited functionality to 32 bit signed integers in order to be able to represent the bit patterns of negative numbers as 2s complement integers. I.e. after #1968 hex(-1)
returns '0xffffffff'
where in your extension it returns '-0x1'
. Similarly after #1968 0xffffffff
is parsed as -1
where in yours it's 4294967295
. I assumed the 2s complement representation is more useful. It has the downside that we have to have different parse/format functions for different word sizes and for that reason it's limited to 32 bit integers for the time being. I'm not sure how that would work with non-integer numbers.
I only meant that it should work with non-integer BigNumbers. You reasoning for signed 32 bit wrap-around semantics for Number type makes sense (although, I personally would prefer unsigned 32 bit semantics rather than signed, but this is only because I'm too used to C and C++ where signed integer overflow results in undefined behaviour).
In my opinion it is fine that, for example, hex(Number)
function only works on integers and has 2s complement 32bit semantics (i.e. hex(-1) = 0xffffffff
if MathJS configured so that -1
is of type Number). What I suggest, is that hex(BigNumber)
should just convert whatever number was passed to it (regardless of size, or whether it's integer or not) to string. I can't think of another reasonable behaviour for BigNumbers.
As for the literals, I think similar approach makes sense. I.e. if MathJS configured so that number literals are of type Number then it makes sense that 0xffffffff
would be interpreted as -1
. But if it uses BigNumbers then same 0xffffffff
would be of type BigNumber and it having value of -1
doesn't quite makes sense. So in such case I'd expect it to be interpreted as 4294967295
and then also allow decimal point, e.g. 0xab.c
to represent 171.75
.
My opinion here is that if you're working with binary/octal/hex representations of numbers then you probably care about word size too. In that sense the behavior of parsing/formatting these representations shouldn't change if you're using BigNumber
or number
.
I.e. if MathJS configured so that number literals are of type Number then it makes sense that 0xffffffff would be interpreted as -1. But if it uses BigNumbers then same 0xffffffff would be of type BigNumber and it having value of -1 doesn't quite makes sense.
I don't agree. The type number
in mathjs isn't an integer any more than the type BigNumber
so I don't think the behavior should change for that reason.
I do agree though, that the user should be able to choose the word size (up to 'unlimited') and whether they're signed or unsigned when working with these representations. I don't think that the behavior should have anything to do with BigNumber
vs number
other than if you want your word size to be larger than 53 bits you need to use BigNumber
because number
can only represent integers exactly up to 53 bits.
My opinion here is that if you're working with binary/octal/hex representations of numbers then you probably care about word size
Not necessary. MathJS is math library (as its description says), and concepts like signedness/size/negative numbers representation are mostly relevant in software development, not in math. Hence I'd expect it just to convert any real number to different base (2, 8, 18) as-is.
I agree that types (Number vs BigNumber) are orthogonal to base conversion semantics. Maybe the ideal way would be to provide some flexibility to user via something like extra parameter for conversion functions (e.g. hex(123, u32)
where u32
means unsigned 32-bit integer), and extra suffix for literals (e.g. 0xffffffffi32
where i32
means signed 32-bit integer). Then everything without a suffix could be treated just like mathematical base conversion. But I suspect this is quite a bit of work to implement.
Besides, in BigNumber library (decimal.js) the precision is specified in terms of significant digits, so mapping it to word size could be not trivial (for example, if I configured MathJS to use BigNumbers with 20 significant digits, what would be word size?).
So my suggestion to keep existing semantics for Numbers and to treat BigNumbers like real numbers and just perform base conversion as-is was made mostly from practical standpoint, as it should be not too difficult to implement, and I still think it makes more sense than trying to apply integer restrictions and wrap-around semantics to BigNumbers. But I agree, this suggestions suffers from some inconsistency between Numbers and BigNumbers.
I wonder what @josdejong thinks about this.
MathJS is math library (as its description says), and concepts like signedness/size/negative numbers representation are mostly relevant in software development, not in math. Hence I'd expect it just to convert any real number to different base (2, 8, 18) as-is.
Makes sense.
Maybe the ideal way would be to provide some flexibility to user via something like extra parameter for conversion functions (e.g. hex(123, u32) where u32 means unsigned 32-bit integer), and extra suffix for literals (e.g. 0xffffffffi32 where i32 means signed 32-bit integer). Then everything without a suffix could be treated just like mathematical base conversion.
I think that is a reasonable solution.
But I suspect this is quite a bit of work to implement.
I don't think it would be too bad. I think it would certainly be easier than the alternative of adding types Int8
, Uint16
, etc, and extending all the functions to work on those types appropriately.
Good discussion 👍
Some thoughts:
hex(-1)
returns something totally different when switching to BigNumber.number
or BigNumber
. For number this must be max 32, for BigNumber it can be larger.0xffffffffi32
is a nice idea. I'm afraid though that this notation is too "alian" for most users. How about creating a helper function like fromHex('0xffffffff', { signed: true, bits: 32})
maybe? Similary, we could pass these options the functions like hex(123, { signed: false, bits: 32 })
.I think we can to introduce 2 configuration options to make everything fully customizable: bits
(or wordSize
or something) and signed
(what would be a good name for this?). This can be configured globally, or passed as options to the functions like hex
. It will be good to think through what would be the best defaults for those options are, for me { bits: 32, signed: true }
makes sense because this works for both number/BigNumber.
As for point 4, it's actually fairly common notation. Languages like C and C++ has literal suffixes for a long time, and, for example, Rust uses the exact proposed syntax https://doc.rust-lang.org/rust-by-example/primitives/literals.html . Suffixes could of course be accompanied by fromHex
function.
How about creating a helper function like fromHex('0xffffffff', { signed: true, bits: 32}) maybe? Similary, we could pass these options the functions like hex(123, { signed: false, bits: 32 }).
I think that is the best option for handling word size. I don't think word size should be a global config because I would like to be able to work with different sizes in the same environment.
As for point 4, it's actually fairly common notation. Languages like C and C++ has literal suffixes for a long time, and, for example, Rust uses the exact proposed syntax https://doc.rust-lang.org/rust-by-example/primitives/literals.html . Suffixes could of course be accompanied by
fromHex
function.
Ah, I didn't know that. Thinking about it, it can't do harm, and it's basically a more compact syntax for a function like fromHex
which would allow passing those parameters too.
I think that is the best option for handling word size. I don't think word size should be a global config because I would like to be able to work with different sizes in the same environment.
Interesting idea to see if we can prevent the need for global config at all, that would make this totally unambiguous! So that would mean that when I enter say 0xffffffff
, it would be interpreted with the default that we select (like 32 bits, signed), and if I would like to interpret it differently, I must add a suffix like 0xffffffffu64
.
We would lose some user convenience, but it could be a great starting point. If it turns out to be too cumbersome, we could always reconsider making the default word settings configurable, that would not be a breaking change.
Nice!! Will review it hopefully this weekend.
I think this can be closed. See #1996.
Ah, you're right 👍
This is related to #1968 which went into 7.3.0.
@clnhlzmn added
hex
/bin
/oct
functions and literals for regularnumber
type (integer only), which is nice. However, if MathJS is configured to useBigNumber
type by default, unfortunately none of this works. On CL Calc website (which is still on 7.2.0) I implemented this as an extension on top of MathJS, so it works withBigNumber
s, including non-integers (see an example here). I think this is how it should be implemented in MathJS forBigNumber
type.