qrilka / xlsx

Simple and incomplete Excel file parser/writer
MIT License
128 stars 62 forks source link

Parse Numbers as Scientific, not Double #176

Open olafklinke opened 1 month ago

olafklinke commented 1 month ago

Currently xlsx seems to build on the Data.Text.Read module, which provide polymorphic functions like

rational :: Fractional a => Text -> Either String (a,Text)

This is however used to parse a decimal number from the XML into a Double value (note that CellDouble :: Double -> CellValue) which for most values can not represent the rational number exactly. This is especially annoying when the decimal number represents a LocalTime, because rounding errors can shift the time into a different day, thus affecting the displayed value quite a lot. My proposed change is that

  1. CellValue gets an additional CellDecimal constructor for implementing and testing the rational parsing to Scientific.
  2. When this is established, CellDouble is deprecated.

To limit breakage, one could provide a view pattern named CellDouble that uses Data.Scientific.toRealFloat for the transition phase.

olafklinke commented 1 month ago

Remark: I have not sanity-checked my PR, it is just the bare minimum to make the changes compile as a proof-of-concept. In particular, the TH-generated _CellDouble prism is still missing.

qrilka commented 1 month ago

Thanks @olafklinke I'll try to find time for that PR

olafklinke commented 1 month ago

I noticed that I neglected to adjust the test suite. But a simple s/CellDouble/CellDecimal/g for each file in test/ makes all tests pass.

qrilka commented 1 month ago

Would you mind fixing the tests?

olafklinke commented 1 month ago

Apologies, the last commit that should have fixed the test apparently did not change any files. I added a 9th commit to my fork. How do I update the PR to include these commits? Or does the PR update itself?

qrilka commented 1 month ago

It does but unfortunately CI run doesn't get automatically approved (probably I need to do something about that)

olafklinke commented 1 month ago

If the OOXML spec really states that all numbers in spreadsheets are xsd:double then the real issue is not the conversion between decimal representation and CellDouble but rather the fact that TimeOfDay and Double are in mismatch and do not properly round-trip: A second in a standard day is 1/86400 which is not a dyadic rational and hence neither representable in Double nor does it have a finite decimal expansion.

qrilka commented 1 month ago

Oh, shoot, I didn't check the XSD spec, thanks. OOXML spec doesn't really state that. That was only for cached values and it seems I didn't check it all through...