Open olafklinke opened 4 months ago
Fixed the test suite. Note that the code base may have some CellDouble
patterns lurking, which do an implicit Scientific -> Double
conversion where it may be unreasonable. Sadly the one-directional pattern synonym can not make the change in representation completely transparent. (That is, Haskell currently does not allow writing a bi-directional pattern that uses a different conversion function each way.) We might want to come up with a good test that exhibits the rounding errors in using the CellDouble
pattern.
Further I am not an expert at writing prisms, nor lens idiosyncrasies at all, so please feel free to edit my PR so that all required lenses are present. Otherwise I guess this PR would introduce an unnecessary amount of breakage in downstream packages.
Would you mind checking my comments on the PR? And it would be great to add the test you proposed as well. As for unforseen corner case - let's take a chance and fix problems when we see some.
@olafklinke it looks like the microlens
flag is a bit broken
Ah yes, the manually defined _CellDecimal
prism conditional on the microlens flag was missing. This is fixed in 6c9ddcb. Further there are two warnings about incomplete pattern matches in src/Codec/Xlsx/Writer/Stream.hs:346 and src/Codec/Xlsx/Writer/Stream.hs:357 but these are actually covered by the CellDouble
view pattern. Therefore I added a COMPLETE pragma.
Thanks for fixing that, could you also address my other comments? BTW it would be neat to squash the commits
could you also address my other comments
What is left unaddressed?
Changing CacheNumber
to have a Scientific
argument is not trivial, because in Codec.Xlsx.Writer.Internal
we would need a ToAttrVal Scientific
instance, but the txtd
function that is used for the Double
instance relies on RealFloat
which Scientific
is not. Hence we need a better conversion Scientific -> Text.Builder
first. We could use the one provided by scientific but I have no idea whether that conforms to the XML standard. Would that be covered by the test suite?
This looks like this uncovers the problem with the PR - it doesn't fix the writer as a result saving and reading will be not idempotent :-\ Looks like OOXML spec doesn't specify anything about decimals in cell values :( But for cached fields it's
<xsd:attribute name="v" use="required" type="xsd:double"/>
so I suppose we could use it as a guide here. Does this sound fine with you? Also we need to add a decimal which is not representable with the current xlsx version and which will make writer/read not idempontent in this PR.
Also let's add a bound on scientific
There are two idempotency issues here: The immediate Haskell idempotency issue is that Data.Text.Read.rational
and Data.Text.Lazy.Builder.scientificBuilder
need to be mutually inverse. That is within our power. The second idempotency issue is whether other OOXML-processing software can change the representation of decimal numbers so that it trips up Data.Text.Read.rational
, or whether the text representations produced by Data.Text.Lazy.Builder.scientificBuilder
adhere to the XML standard. (I think they do, it is quite liberal.)
Indeed the standard for xsd:double
states that the value range must adhere to IEEE-754 which Scientific
can easily surpass. Conversely, Double
has values that are not representable in Scientific
, e.g. infty = 1/0 :: Double
which fromFloatDigits
maps to 1.797693134862316e308
.
In fact, even the current published version of xlsx
does not adhere to the standard, since Data.Text.Read.rational
fails on the texts "NaN"
and "INF"
which are legal xsd:double
according to the spec.
Summary: If the OOXML standard really forces all decimals in XLSX to be IEEE-754 doubles, then #176 has no basis, as all double precision floating point numbers do have an exact representation in scientific notation, and a slightly improved rational
parser would have us covered for the NaN
and INF
corner cases.
As I said in the other doc it seems that I was extrapolating too much, cell values are not xsd:double
but cached numbers are. Ok, in this situation I'd propose to leave CacheNumber as is and make read . write == id
including currently unrepresentable date values.
The XML contains decimal values, which are accurately represented by
Scientific
but not byDouble
as in theCellDouble
constructor ofCellValue
. This PR makesCellDouble
a pattern synonym and replaces the old constructor by a new constuctorCellDecimal
which holds aScientific
value. For motivation, see this issue.