JohnAD commented 3 years ago

Add decimal to standard library

Abstract

Adding a data type that provides support for decimal numbers (base 10 rather than base 2).

Motivation

Many programming languages have built-in support for decimal types and decimal math. This is because when writing programs that support numbers and both originate in base-10 math and are stored in base-10 math, converting to binary floating point number creates unwanted conversion errors between the numbering systems.

The most common types of programs that use decimal:

financial or banking programs
scientific or lab-oriented programs

Some examples from other languages:

C#: https://docs.microsoft.com/en-us/dotnet/api/system.decimal?view=net-5.0
python: https://docs.python.org/3/library/decimal.html
PostgreSQL: https://www.postgresql.org/docs/10/datatype-numeric.html (see decimal)

One of the benefits of adding it to the standard library is possible compiler support as well. This is important for handling source-code literals. Details below.

Description

This would be a single 'decimal.nim' file added to the list of pure libraries that come with the compiler. A new decimal type is introduced and allows decimal storage and manipulation.

Adding this to the standard library creates a common point of reference for other libraries and procedures that need to pass or receive decimal numbers with other libraries.

I recommend that the library store a big decimal number that can meet and conform to a public spec. Specifically, I recommend it meet IEEE 754-2008's 128-bit specification. I'm not suggesting that it store the number in the IEEE spec. Simply that it be designed to support:

at least 34 digits of precision
stored significance value
+/- infinity and nan states

That way, the decimal library could be used to export/import the IEEE spec.

One of the benefits of making it standard is possibly also better handling of decimal literals in source during compilation. For example, using a third party library, the following could be made to work:

var x: decimal = "123.456"

but the following could not work:

var x: decimal = 123.456

This is because the compiler normally would convert the 123.456 into a floating point number before attempting to assign it. Thus, it introduces a base2-to-base10 conversion problem before the number is even stored.

Four ways this could be handled:

1) the compiler "figures out" that it is trying to convert a sequence of digits for assignment to a decimal and passes in the value as a string despite the lack of quotes.

2) Support for a suffix, similar to the method seen in c#'s m is used. Thus, 123.456m is the same as newDecimal("123.456").

3) Generic support for number suffixes to be usable generically. So, 123.456m becomes m"123.456". The decimal library would define a string template or proc for m.

4) Compiler does nothing new and the decimal library simply fails if you attempt to assign a floating point number to it. In that case, this proposal is strictly a new library for the Nim standard library.

I strongly suggest option 3 as it also opens up possibilities for other uses.

BTW, two third-party decimal libraries of note; both based on the IEEE protocol:

https://github.com/status-im/nim-decimal https://github.com/JohnAD/decimal128

The second one I wrote myself. Neither are really ready for inclusion in the standard library yet.

This project will take at least a year to complete.

Examples

import decimal

let a: decimal = 4003.250m

assert a.significance == 7
assert a.scale == 3
assert a.toFloat == 4003.25
assert a.toInt == 4003
assert $a == "4003.250"

assert 4003.250m == decimal("4003.250")
assert 4003.250m == decimal("4003250E-3")

assert 4003.250m != decimal("4003.25")

Backward incompatibility

There would not be any backward incompatibility issues since decimal numbers are not currently supported by the language or it's standard library.

JohnAD commented 3 years ago

I did not say so in the first comment, but I'm willing to write this library or help others write it.

Araq commented 3 years ago

Should be added to Fusion first but I cannot see how a library like this might be controversial, so I'm adding the "Accepted RFC" tag already.

narimiran commented 3 years ago

This project will take at least a year to complete.

This caught my eye. Can you expand a bit why do you think it will take that long?

planetis-m commented 3 years ago

Is by any chance a shorter variant like decimal64? I don't really require the bounds supported by decimal128 in my usecase so I temporary use https://gist.github.com/Araq/c71b764b94188337b24c6180b239229d

JohnAD commented 3 years ago

This project will take at least a year to complete.

This caught my eye. Can you expand a bit why do you think it will take that long?

Two reasons:

I'll be limiting myself to about 4 hours per week; I'm already programming 50+ hrs/week for my remote gigs (paying ones).
The needed mathematic functions are non-trivial to implement. Basic stuff like addition and multiplication I can grab from known C libraries. But some items such as ln and exp will take a bit more effort.

Also, although I say that I need not "store the number in the IEEE spec". On my first pass, I'm going to attempt that anyway. The number will literally only use up 128 bits of RAM if that works out to be reasonable. Likely a struct of four uint32. (My current library uses about 40+ bytes in a setup that is easy to use but wasteful of space.)

I'm happy to share the workload if anyone is volunteering!

JohnAD commented 3 years ago

Is by any chance a shorter variant like decimal64? I don't really require the bounds supported by decimal128 in my usecase so I temporary use https://gist.github.com/Araq/c71b764b94188337b24c6180b239229d

The IEEE spec also defines a 64-bit and 256-bit version. Perhaps later on I could write a 64-bit version. Once the techniques are in place, it should be fairly straightforward.

Araq commented 3 years ago

We should start with a module that contains the basics and then add more operations incrementally. No need to wait for a year. :-)

pigmej commented 3 years ago

Some time ago I created https://github.com/pigmej/nim-simple-decimal/blob/master/simpledecimal.nim which is really simple but may fulfill some basic reqs of someone (it's also based on some if Araq code)

Anyway, having full blown decimal in std is must have I think.

planetis-m commented 3 years ago

There is also https://github.com/Sud0nim/Decimal

JohnAD commented 3 years ago

@Araq and others.

This project is moving along faster than I thought. I might have a PR being made in the next 3 or 4 weeks. The first PR will support the basics and two math ops: addition and subtraction.

The question:

this project involves 2 parts: the decimals.nim library for Nim/lib/std and a change to the lexer/parser. These can safely be implemented independently.

So, should I create two PRs? Or, glom them both together into one?

They are independent because the lexer/parser change is a proposed generic expansion of the language.

Details:

Currently, when you put a number into source code, the lexer insists on making it either an integer or floating literal. There is no way to support anything else. This change adds a new token: tkStrNumPrefixLit that is invoked if the digits are appended by an identifier (as long as the identifier does not conflict with f32, f64, e{N} etc.) In the parser, a tkStrNumPrefixLit followed by a tkSymbol/etc is turned into a dot expression. Thus, this now works:

proc fooBar(num: string): string = 
  result = "foo " & num & " bar"

var a = 1234.56fooBar  # the equivalent of: "1234.56".fooBar

assert a == "foo 1234.56 bar"

Or, from the point of view of the decimals library which has proc M*(num: string): Decimal:

import std/decimals

var a = 5192296858534827628530496329220095M   # this number will not fit in u64
var b = 0.31415E1M

Araq commented 3 years ago

So, should I create two PRs? Or, glom them both together into one?

Two PRs please and more importantly, two RFCs. Or rather an update to the existing literals RFCs, https://github.com/nim-lang/RFCs/issues/216 and https://github.com/nim-lang/RFCs/issues/228

JohnAD commented 3 years ago

Just a note to this thread: the part of this project that creates support for custom numeric literal suffixes to the compiler is mostly done. There will be upcoming tweaks of course. The suffixes will always have a single quote as part of the name. So, to update my earlier code, a decimal can be declared like such:

import std/decimals

var b = 0.31415E10'm

var amt = 12.9942'm(places=2)

I will now start back on the main part: finishing the decimal library itself.

nim-lang / RFCs