Open githubuser181226 opened 1 year ago
Decimals have '.' instead of ','
el
-- Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist @. / | Telephone: +264 81 124 6733 (cell) PO Box 8421 Bachbrecht \ / If this email is signed with GPG/PGP 10007, Namibia ;____/ Sect 20 of Act No. 4 of 2019 may apply On 26. Nov 2022 at 20:54 +0200, githubuser181226 **@.***>, wrote:
Hi. I'm sad to report the SUM function doesn't work as expected with floating point values. Consider this simple example.csv: Col1;Col2;Col3 1;"Text1";1,01 2;"Text2";2,02 3;"Text3";3,03 When running csvq -d ; "select min(Col3),max(Col3),sum(Col3) from example" I get: +-----------+-----------+-----------+ | MIN(Col3) | MAX(Col3) | SUM(Col3) | +-----------+-----------+-----------+ | 1,01 | 3,03 | NULL | +-----------+-----------+-----------+ What am I doing wrong? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
1.01 + 2.02 + 3.03 = 6.0600000000000005
is the expected result with floating numbers in IEEE 754. This is the same in (almost) all programing languages. You can check this page for python.
You can use for example round(sum(Col3),10)
in place of sum(Col3)
if you want a decimal with ten digits precision.
I understand, but I'm confused. Why MIN and MAX work? This would suggest, MIN and MAX do some string comparisons instead of "math". This would explain why these two work, but not SUM nor AVG is. Am I correct?
I get the floating point precision problem, but I thought it was weird nontheless and I can imagine this getting quite problematic when dealing with any amount of accounting data. Imaging searching for that missing/extra cent in a thousands of records... Is there an option to set some implicit floating point values rounding/precision globally, so this weird result don't happen, just like Excel does it?
The aggregate functions MIN
and MAX
can be used with values that can be compared with <
and >
. In this case, they are compared as strings. However, the aggregate functions SUM
and AVG
can only be used with floating-point numbers.
The floating-point precision problem is very troublesome to calculate programmatically. Similar problems occur in Excel as well. The following screenshot shows a comparison of floating-point numbers in Excel.
Values:
A1 and C1 are formulas.
The value of C1 indicates that A1 and B1 appear to be the same number, but internally they are different numbers. As far as I know, there is no way to completely solve this floating-point problem.
The aggregate functions
MIN
andMAX
can be used with values that can be compared with<
and>
. In this case, they are compared as strings. However, the aggregate functionsSUM
andAVG
can only be used with floating-point numbers.
I see. Thanks for the explanation.
The floating-point precision problem is very troublesome to calculate programmatically. Similar problems occur in Excel as well. The following screenshot shows a comparison of floating-point numbers in Excel.
True. I know that, but I can limit the precision for the floating point globally and get rid of the imprecision that way.
This is LibreOffice Calc and in the options I've set the limit of fractional digits to 4 and the problem disappeared.
Thus I'm asking if there is a way to do exactly that in csvq/SQL without placing ROUND() everywhere, which is PITA not only to do it, but to remember when and where to do it.
Cheers.
There is no way to set the precision globally. Users must explicitly use ROUND() if necessary.
@mithrandie If #91 is considered, perhaps the output precision, like the output number delimiters, can be considered part of the "configuration". This won't solve the problem of internal comparison, as you indicated in the example with Excel, but it may be helpful.
Pipe through something like QSV?
In the first instance something like
qsv apply operations currencytoenum columnname
comes to mind.
el
-- Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist @. / | Telephone: +264 81 124 6733 (cell) PO Box 8421 Bachbrecht \ / If this email is signed with GPG/PGP 10007, Namibia ;____/ Sect 20 of Act No. 4 of 2019 may apply On 27. Nov 2022 at 08:45 +0200, githubuser181226 **@.***>, wrote:
The aggregate functions MIN and MAX can be used with values that can be compared with < and >. In this case, they are compared as strings. However, the aggregate functions SUM and AVG can only be used with floating-point numbers. I see. Thanks for the explanation. The floating-point precision problem is very troublesome to calculate programmatically. Similar problems occur in Excel as well. The following screenshot shows a comparison of floating-point numbers in Excel. True. I know that, but I can limit the precision for the floating point globally and get rid of the imprecision that way. This is LibreOffice Calc and in the options I've set the limit of fractional digits to 4 and the problem disappeared. Thus I'm asking if there is a way to do exactly that in csvq/SQL without placing ROUND() everywhere, which is PITA not only to do it, but to remember when and where to do it. Cheers. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Hi. I'm sad to report the SUM function doesn't work as expected with floating point values.
Consider this
example.csv
Col1;Col2;Col3 1;"Text1";1,01 2;"Text2";2,02 3;"Text3";3,03
When running
csvq -d ; "select min(Col3),max(Col3),sum(Col3) from example"
I get: +-----------+-----------+-----------+ | MIN(Col3) | MAX(Col3) | SUM(Col3) | +-----------+-----------+-----------+ | 1,01 | 3,03 | NULL | +-----------+-----------+-----------+`
Now consider example2.csv where I replaces commas for dots as floating point separator:
Col1;Col2;Col3 1;"Text1";1.01 2;"Text2";2.02 3;"Text3";3.03
When running
csvq -d ; "select min(Col3),max(Col3),sum(Col3) from example2"
now I get +-----------+-----------+---------------------+ | MIN(Col3) | MAX(Col3) | SUM(Col3) | +-----------+-----------+---------------------+ | 1.01 | 3.03 | 6.0600000000000005 | +-----------+-----------+---------------------+
Both examples give wrong results. What is happening?
Please advise.