jk-jeon / dragonbox

Reference implementation of Dragonbox in C++
Apache License 2.0
607 stars 39 forks source link

enhancement proposal: pimp up the readme, a little, for dummies ... #52

Closed newbie-02 closed 6 months ago

newbie-02 commented 10 months ago

[ edit ] solved, the below discussion - besides lengthy - helped me
to get the door open, thus if you don't find better solution take
the time and read. [ /edit ]

I followed the readme, downloading and building had meaningful
responses, but I'm stuck to get the door open.
Would like to compile and run the test programs provided in the readme, but don't know where to put and how to compile.
I'm not an experienced C++ coder, neither can tell the difference
between Cmake and make.

TIA for any help :-)

jk-jeon commented 10 months ago

Thanks for the input. Instructions for building tests and others are here: https://github.com/jk-jeon/dragonbox?tab=readme-ov-file#how-to-run-tests-benchmark-and-others

It assumes that you know what CMake is and how to use it to build a C++ project. It's not the most user friendly tool, I know, but I don't think explaining about CMake is what the project's README is supposed to do.

That said, I guess it would be a good idea to include a link to a CMake tutorial or something like that in README to help people who are not very familiar to C/C++.

To actually help you, let me ask: what OS are you using? Do you know which compiler are you going to use? Have you installed CMake and git on your machine?

newbie-02 commented 10 months ago

OS: Kali ( debian ), compiler: yes, g++ / gcc, cmake: yes, worked to build dragonbox, git: yes, worked to clone dragonbox, I even managed to build and run 'test', while not yet deciphered how to start 'benchmark':

./benchmark
[Running benchmark for binary32...]
Generating random samples...
terminate called after throwing an instance of 'std::out_of_range'
  what():  stof
zsh: IOT instruction  ./benchmark

and or evtl. the other subprojects. My wish was to compile and run the two 'Usage Examples' from the readme to have a 'minimal example', but don't know how to. ( Let me say that I think the overall structure and information content of the project and readme is very good, it's just that little oddity that holds me back. And ask if there are specific reasons to use C++ instead of 'vanilla-C? Basically, shouldn't 'vanilla' be adaptable for a larger number of projects? )

jk-jeon commented 10 months ago

[Running benchmark for binary32...]
Generating random samples...
terminate called after throwing an instance of 'std::out_of_range'
 what():  stof
zsh: IOT instruction  ./benchmark

Thanks for catching this. This is probably a bug introduced when I ported the code into C++11. It's fixed in the recent commit.

My wish was to compile and run the two 'Usage Examples' from the readme to have a 'minimal example', but don't know how to.

For testing jkj::dragonbox::to_decimal, you could just do something like this: https://godbolt.org/z/hPW4a8nj7 (you don't even need to compile it.)

To do a proper testing, I recommend you to modify sandbox.cpp and build the subproject sandbox. That's probably the easiest way to test it yourself.

And ask if there are specific reasons to use C++ instead of 'vanilla-C? Basically, shouldn't 'vanilla' be adaptable for a larger number of projects? )

That's true, but I don't want to restrict myself into C. I generally don't enjoy coding in C and avoid it whenever possible. I leveraged quite a lot of C++ features, like compile-time computation and some template metaprogramming which helps me a lot to avoid code duplication. Like, I don't want to pointlessly copy-paste the almost same code twice for float and double. C does not allow me to do so, unless I do some insane macro abuse.

Also, it should be easy to wrap it inside a C interface if needed. Just provide a .h and a .cpp files, and declare an extern "C" function on .h file, and include that .h file along with dragonbox.h in the .cpp file, and implement the function by just calling jkj::dragonbox::to_chars. That's a pretty standard way of wrapping C++ with C.

newbie-02 commented 10 months ago

:-)
thanks you very much for fast and professional help,

benchmark: now works,

godbolt: good thing, had some hassles with it formerly,
your snippet works quite well :-)
Evtl. add

#include <iostream>

and

    std::cout << v.significand << std::endl; 
    std::cout << v.exponent << std::endl; 
    std::cout << v.is_negative << std::endl;

to provide even easier 'get in touch' for newbies.

wrapping for C: will try, think I have a chance, also a
chance to fail, if someone with 'skills' can provide such
in a well designed performant version ( against newbie
and script-kiddie fails ) it could also be helpful to others.

( Think of sthg. like 'put files xxx in dir yyy, 'include zzz'
in your file and command 'abcd( x )' will provide v with
v.significand, v.exponent and v.is_negative.

Absolutely cool would be a one_flush solution for
'mpdecimal' in one optimized step fill a struct:

/* mpd_t */ 
typedef struct mpd_t { 
    uint8_t flags; 
    mpd_ssize_t exp; 
    mpd_ssize_t digits; 
    mpd_ssize_t len; 
    mpd_ssize_t alloc; 
    mpd_uint_t *data; 
} mpd_t; 

where for 64-bit builds all members except flags are 64-bit
types ( personally I'm confused between exp which needs
to be signed while digits, len, alloc unsigned ), Setting that to
bit-0 of flags acc. v.isnegative,
exp ( decimal exponent ) to v.exponent,
digits to the length of v.significand ( is that available from
dragonbox or would it be necc. to count digits? )
len to 1 ( 1 64-bit int is enough for 19 digits significand ), alloc to 2 ( std. min allocation ), and *data ( decimal significand ) to v.significand
could be an optimal co-play between two powerful
specialists.

Why I ask for professional help? mpdecimal includes a 'set
from triple' function which is the level I might be able to step
in, but assume for a good performant solution it would be
better to work with direct assignments instead of 'from triple'
counting digits or other steps storing - fetching variables ... )

For other dummies curious: I managed to run the second
example locally with some modifications ( rename to
dragonbox_basic.cpp after check, instructions in comments ):
dragonbox_basic.txt,
[ edit ] similar didn't work out for the first example reg. some cryptic
'namespace', 'ld returned 1' problems.
with some more
help from jk-jeon I managed to run the first example too,
pointless for pro's but might help for other dummies ...
dragonbox_basic_to_chars.txt [ /edit ]

jk-jeon commented 10 months ago

Here is an example of wrapping C++ inside C interface.

dragonbox_c.h

#ifdef __cplusplus
extern "C" {
#endif

char *dragonbox_double_to_chars(double x, char *buffer);

#ifdef __cplusplus
}
#endif

dragonbox_c.cpp

#include "dragonbox_c.h"
#include "dragonbox/dragonbox_to_chars.h"

char* dragonbox_double_to_chars(double x, char* buffer) {
  return jkj::dragonbox::to_chars(x, buffer);
}

And then you compile dragonbox_c.cpp along with dragonbox_to_chars.cpp.

A usage example: main.c

#include <stdio.h>
#include "dragonbox_c.h"

int main() {
  char buffer[25];
  dragonbox_double_to_chars(1.23456, buffer);
  printf("%s", buffer);
}
jk-jeon commented 10 months ago

to provide even easier 'get in touch' for newbies.

I don't think people who can't write Hello, World! in C or C++ are very likely interested in this project anyway.

'mpdecimal' in one optimized step fill a struct:

I don't know about this project, but yes, dragonbox doesn't provide a way to count the number of decimal digits of an integer.

similar didn't work out for the first example reg. some cryptic

I think that's likely because you didn't include dragonbox_to_chars.cpp in your build and link your executable with the resulting .o file.

newbie-02 commented 10 months ago

We are slightly off from the original point, but on a good discussion,
:-)
wrapping: already tried acc. another example, works in general, but
producing good code with C++, cmake, wrapping ... isn't a newbie
task normally, but I like to learn, will try your example too.

people who can't write Hello, World! in C or C++

there are two biases - IMHO - , it's a gargantuan step from writing
hello world to your level of programming skills,
quite often people would be interested in a project ( which happened
to me just to use the conversion without understanding 'Schubfach'
or fighting with github showing only 25 pages of your paper ),
but quite often are blocked in 'getting in touch' by some
small oddities, ( which happened to me and dragonbox as well some
days as two years ago ).

mpdecimal: an IMHO genious project providing arbitrary precision decimal math, on first glance cumbersome data structure, but
very good performance, IMHO due to easier conversions to / from
decimal strings than binary datatypes have. It's driving the python 'decimal' module and - IMHO - making a very good job there.

Alas it either gets
Decimal('0.1000000000000000055511151231257827021181583404541015625')
for 'decimal.Decimal( 0.1 )', passing on the inaccuracy of the binary
representation, or needs to use strings: 'decimal.Decimal( str( 0.1 ) )'
which I assume slow. Thus I assume your project could provide a
good speedup.

doesn't provide a way to count the number of decimal digits of an integer.

Think / hope something like 'Null-terminate the buffer and return the pointer
to the null character' which you provide for to_chars might be useful for this?
( we don't need the places of an arbitrary integer, just that of the produced
significand )

I think that's likely because you didn't include dragonbox_to_chars.cpp

super, think you'd spot the point, got it one step further and think to continue
and provide similar 'dummie solution' as for example two when I find some
time.

Alcaro commented 10 months ago

Yes, because Decimal(0.1) is, in fact, a shorthand for Decimal(0.1000000000000000055511151231257827021181583404541015625); 0.1 is not a possible value for a float64, it gets rounded. It's close enough for most practical purposes, but when dealing with high-precision mathematical libraries like mpdecimal or Dragonbox, it's easy for incorrect expectations to trip you up.

Decimal(str(0.1)) may look like it does the right thing, but in reality, it's just two rounding errors that cancel out, and it will demonstrate its wrongness if you ask for decimal.Decimal(str(0.1+0.2)).

If you want perfect accuracy, don't use floats at all; use only integers, mpdecimal, and hardcoded strings. Decimal(1)/Decimal(10) is 0.1, and Decimal('0.1')+Decimal('0.2') is 0.3 and not 0.30000000000000004.

Yes, floats are confusing, and there's a lot of incorrect or oversimplified information floating around on the internet.

jk-jeon commented 10 months ago

Dear @newbie-02 ,

@Alcaro is making a good point. If your intention is to use Dragonbox as a "clever trick" for turning Decimal(0.1) into Decimal('0.1'), you are very likely doing something nonsensical or at best a dirty hack. I don't know what exactly you are trying to do and why, but it sounds like you may need to reconsider doing so.

Anyway, for your comments:

there are two biases - IMHO - , it's a gargantuan step from writing hello world to your level of programming skills, quite often people would be interested in a project

My point is that adding

 std::cout << v.significand << std::endl; 
 std::cout << v.exponent << std::endl; 
 std::cout << v.is_negative << std::endl;

to the usage example doesn't seem to add that much of value, because intended users of the library are probably already aware of how to print out integers.

Think / hope something like 'Null-terminate the buffer and return the pointer to the null character' which you provide for to_chars might be useful for this? ( we don't need the places of an arbitrary integer, just that of the produced significand )

Just use linear search or binary search or whatever. You just need to do a series of comparisons against numbers of the form $10^k$. Note that what jkj::dragonbox::to_chars_n internally does is not very much more sophisticated than that, so it's quite pointless to use it as a "substep" for counting digits.

newbie-02 commented 10 months ago

@Alcaro

Decimal(str(0.1)) may look like it does the right thing, but in reality, it's just two rounding errors that cancel out, and it will demonstrate its wrongness if you ask for decimal.Decimal(str(0.1+0.2)).

yes, think the second is not a rounding error but a meaningful conversion which get's
users intention back. 'decimal.Decimal( str( 0.1 ) ) + decimal.Decimal( str( 0.2 ) )'
will become meaningful, and could build a bridge between binary and decimal world.
It would be a lot of 'str()'s, and thus simplifying the writing for it, and pimping the
speed to excess, or better off avoid str() in favor of a digital conversion would be
necc. / meaningful.

@jk-jeon

Yes, floats are confusing, and there's a lot of incorrect or oversimplified information floating around on the internet.

We have a saying in germany which could translate to: 'You speak a big word calmly'.
decimal vs. binary datatypes ... there are controversy POVs whether you consider:

You may assume I'm with the second group, but have to accept that plenty of data
is! around in binary, plenty programs are! based on binary, and few maintainers capable or willing to switch their project. In that situation the 'dirty trick' may
become sense and power.

< adding the printout of integers ...

was less intended to teach printing to experienced dev's,
but to show the structure of 'v' and access to it's parts to newbies.
Bear with us, we are the dummies we are ...

Alcaro commented 10 months ago

there are controversy POVs whether you consider: 0.100000000000000005551115123... an in any way meaningful or correct value, or in any way intended by the user, vs. a poor erratic catastrophy-prone substitute / deputy for what the user meant

Yes, there is.

And most mathematical libraries have an opinion on that question. If you choose libraries with a different opinion than yourself, you're in for a rough ride.

(In fact, even the addition operator prefers the second viewpoint. Proof: 0.1 + 0.2)

newbie-02 commented 10 months ago

@Alcaro you're very welcome in at least accepting that there are different POV.
Just consider '0.2 + 0.1' calculated with the approximated deputies in
binary16 ( 0.2998 ),
binary32 ( 0.3 ),
binary64 ( 0.30000000000000004 ),
binary128 ( 0.30000000000000000000000000000000004 ),
all not 'exact' but 'SRT' nearest to decimal weight of binary deputy.
And consider picking any of them 'right', or explaining to any schoolkid
when to pick which binary datatype, which library, which algorithm
to get results that match what's requested in his tests.

I have not been involved in IEEE 754 standardization, so it's not 'my choice'.

We have lot's of data and programs around depending on binary datatypes,
IMHO none of them c/w/should complain about:

Cumbersome, but can be automated! :-)

One guaranteed benefit would be less users complaining, if any.

jk-jeon commented 10 months ago

I would say just not use binary floating-point numbers at all if you are gonna compute things in decimal anyway. The prime advantage of binary over decimal is that they allow much simpler and faster computation. If the computation will be done in decimals anyway, then there is no point of using binaries. (In addition to that, if the precision is of paramount importance, why using floating-point from the first place? There are better options, like arbitrary-precision rational arithmetic.)

Doing Decimal(str(0.2)) is kinda funny, because it does the conversion from decimal to binary (by the compiler/interpreter) and then from binary to decimal (in the runtime) and hoping that the rounding errors introduced in both steps get canceled by each other. That's totally pointless (if not erroneous), and in this case they really just have to use decimals from the beginning, like by separately passing the integral part and the fractional part, or maybe by separately passing the decimal significand and the decimal exponent, or whatever.

If the number you pass is not a written constant like 0.2, rather some variable you are supplied with from some another interface that you cannot really touch, and if that number is supposed to be in decimal (like, a human input), but for whatever unfortunate reason provided to you in binary, then it might make sense to do this conversion. But I would say that's still a hack, because it's impossible to correctly guess what should be the "intended" value of the binary floating-point number you got. The correctly rounded shortest-roundtrip decimal is sort of a good guess though.

was less intended to teach printing to experienced dev's, but to show the structure of 'v' and access to it's parts to newbies. Bear with us, we are the dummies we are ...

It's already written in the comments. I don't see any value for adding printing statements.

newbie-02 commented 9 months ago

Also, it should be easy to wrap it inside a C interface if needed.

tried but got lost between includes, namespaces, types ...
If someone can provide what's needed to get the functionality of
(Direct use of jkj::dragonbox::to_decimal) in 'C' I'd appreciate,
and possibly others too.

jk-jeon commented 6 months ago

That said, I guess it would be a good idea to include a link to a CMake tutorial or something like that in README to help people who are not very familiar to C/C++.

Did this in 82fb40e.