Floating point - Githubissues

Hello. This is my implementation of the Floating point numbers for #1. I have implemented 32 point bit floating point numbers based on the IEEE 754 standard.

Idea:

I have stored single precision floating point numbers in the 64-bit integer by storing the mantissa m, the exponent e, and the sign s where the floating point number = (-1)^s * 2^(e - 127) + (1+m). The 64-bit integer will have the first 3 most-right bits and the next 5 most-right bits be 1 in order to represent its flag of being a float. Then, the next 23 most-right bits will represent the mantissa m in binary form without the decimal point, the next 8 most-right bits will represent the exponent e, and the next bit will represent the sign s that will be 0 for positive or 1 for negative. Then, the next 3 bits will represent how decimal places d to round the floating point up to when the bits get converted to a float and printed.

Currently, the number is only accurate up to 7 significant figures and therefore will only round up to 0 or n many decimals where n = 7 - significant figures.

The summary of the changes I made are:

types.rkt

imm -> bits: Adds the case for floats and calls the function float -> bits on v to convert it to a float
float -> bits : Figures out the sign s, figures out how many decimal places to round to by calling decimal_place and 'integer-size' to see how many decimals d the float would need to be rounded when it gets printed (standard described above), determines the exponent e using calc-exp, and using the data returned by calc-exp to get the mantissa m. Finally, it gets the mantissa m that gets converted to a binary integer without its decimal place using dec->binary. After that, all 4 of these variables get taken to float->bits_helper to make a 64 bit integer in the way described in the introduction.
integer-size : Given an integer number, figures out the tens place.
decimal_place : Given a decimal number, accumulator, and maximum, figures out and returns the minimum between the number's decimal places and the maximum.
calc-exp: Depending on the sign and operator given which shows if the absolute number of the number given less greater than 0 and less than one or not, it will recursively test which exponent e it is true that 1 <= abs(num)/2^(e -127) <2.
dec->binary: Gets the rounded binary number representation of the decimal mantissa without its decimal place.
float->bits_helper : Using the four parameters of decimal place d, binary representation of mantissa m without its decimal place, exponent e plus its bias of 127, and sign s to construct of 64-bit integer in the way described in the introduction.
bits->imm: Implements the case of a float, by extracting the four parameters of decimal place d, the decimal representation of binary representation of mantissa m using binary->decimal , exponent e, and sign s, returns a float using the formula (-1)^s * 2^(e - 127) + (1+m) to calculate the float number and using round_dec to round this number by d decimal places.
binary->decimal, round_dec, float-bits?: Self explanatory

main.c

Implements the case of a float by using functions from the math library to do the same process as bits->imm.

Makefile, a86/interp.rkt

In order to make it so executables could compile, did the hacky way of adding -lm to the line in each file that compiles the executable. Open to suggestions on better ways to do this.

ast.rkt, compile.rkt, interp.rkt, parse.rkt, types.h

Added case for the Float structure in each file

test-runner.rkt

Steven tests to see that the compiler can compile 4.2, -4.2, 3.3333, 790.321, -8990.32, -9999999, and .9999999 correctly.

Possible things to add next:

A better way to make it possible for executables to run with the C Math Library
A way to throw errors if a number is bigger than 6 significant digits.
Float Operations (fl+, fl-, fl=)

Note about Tests:

The tests on github cannot currently run after I updated the test-runner.rkt and compile.rkt due to the billing issue, but they ran correctly on my local machine.

plum-umd / the-838e-compiler

Floating point #31