qrilka / xlsx

Simple and incomplete Excel file parser/writer
MIT License
128 stars 62 forks source link

Writing default stylesheet to file results in excel complaining it's invalid #134

Closed dten closed 3 years ago

dten commented 3 years ago

this relates to the existence of minimalStylesheet and def for Stylesheet. using def will result in a file that cannot load.

This is also easily hit if for some reason you wrote out a file using xlsx and did not specify a stylesheet, an 'empty' stylesheet will be added. if you end up parsing that stylesheet and writing it back out it will be invalid also.

this seems to be because dxfs will not work unless the minimal fonts are specified as with the other stuff in stylesheet. removing dxfs lets excel load the file again.

qrilka commented 3 years ago

Thanks for raising this. What Excel versions result in that error? Does the example https://github.com/qrilka/xlsx/blob/master/src/Codec/Xlsx.hs#L22-L36 give you that error as well? Your words sound like you have an idea how to create a PR with a fix - do you want to open one?

As for me unfortunately I don't have a desktop version of Excel and I'm not sure how good enough is testing with the online version...

dten commented 3 years ago

I'm just using Excel 2013.

the example you list won't give the error because it doesn't serialise the def of StyleSheet. Instead it jumps straight to the bytes emptyStyles gives https://hackage.haskell.org/package/xlsx-0.8.2/docs/src/Codec.Xlsx.Types.html#emptyStyles

which formatted looks like this

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
</styleSheet>

but serialising def of styles results in

<?xml version="1.0" encoding="UTF-8"?>
<!--Stylesheet generated by xlsx-->
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <numFmts count="0"/>
    <fonts count="0"/>
    <fills count="0"/>
    <borders count="0"/>
    <cellXfs count="0"/>
    <dxfs count="0"/>
</styleSheet>

which results in excel complaining unless I either remove the dxfs element or provide the default fonts like minimalStylesheet does.

Not writing out the empty elements would result in an excel file that loads. However you soon run into the cases where Excel expects something non trivial specified so it's only a small help. I was thinking the docs could be more helpful but they're already pointing out the problems... maybe just needs some bold 😅

qrilka commented 3 years ago

I need to take a look if there's anything in the spec about this or if it's another "feature" of the implementation. But if you have any concrete ways to improve the situation - PRs are welcome for sure.

qrilka commented 3 years ago

@dten I had a look into the spec and I don't see anything there telling that the style above shouldn't be accepted. But I guess it still makes sense to take this implementation detail into account. The only idea I have right now is just removing the Default instance for StyleSheet altogether and publish it as something like emptyStylesheet with a huge warning saying that Excel won't accept it as it is the details of that are rather unknown and minimalStylesheet should be preferred. And as this looks like a breaking change that would require a major version update. Do you see any other ways to resolve this?

dten commented 3 years ago

So I think not writing out the empty elements would help. That would basically mean the special case from emptyStylesheet wasn't special

qrilka commented 3 years ago

Sorry, do you mean that we just need to make countedElementList in https://github.com/qrilka/xlsx/blob/master/src/Codec/Xlsx/Types/StyleSheet.hs#L1145-L1152 skip produce no element if a list has zero elements? That looks doable and backwards-compatible as well. But what about "the cases where Excel expects something non trivial specified so it's only a small help"? Do you think that mentioning something along those lines in haddocks would be a good enough solution?

dten commented 3 years ago

Yes I mean skip the counted list if it's empty

As for non trivial case, I mean in the case where people start adding some elements to styles. The link in the docs explains that there is a point where Excel won't open the file because it only part of the stylesheet is specified.

http://hackage.haskell.org/package/xlsx-0.8.2/docs/Codec-Xlsx-Types-StyleSheet.html#v:minimalStyleSheet

qrilka commented 3 years ago

Thanks. Yes the link explains some details but I think it's not prominent enough so it's easy to miss that part. I'll try to make it more explicit (e.g. mention that Excel has some not quite clear requirements about stylesheet contents). Would you mind me assigning you as a reviewer for a PR about this?

dten commented 3 years ago

I'd be happy to