jorgecarleitao / arrow2

Transmute-free Rust library to work with the Arrow format
Apache License 2.0
1.06k stars 222 forks source link

CSV Parser mishandles Decimal inputs #1449

Open mikolajsnioch opened 1 year ago

mikolajsnioch commented 1 year ago

Hello,

I noticed that when parsing CSV files with decimals in format d+.00 in many cases the parsed decimal is wrong by orders of magnitude.

Example input:

1990.75
2023.00
2034.75
2034.25
2038.50
2038.50
1988.50
1968.75
1971.00
1966.00

Produced output:

1990.75
20.23
2034.75
2034.25
204.3
204.3
199.3
1968.75
19.71
19.66

After investigating I found the offending line:

src/io/csv/read_utils

Line 76: map(|(lhs, rhs, rhs_s)| lhs * 10i128.pow(rhs_s as u32) + rhs)

I am not an expert on these matters, but it seems to me that the correct code should use scale in place of rhs_s as below:

src/io/csv/read_utils

Line 76: map(|(lhs, rhs, rhs_s)| lhs * 10i128.pow(scale as u32) + rhs)

I've made this change locally and it fixed the issue. However, I am not sure if there are other scenarios where this would be incorrect.

Hope it helps, Mikolai