tealeg / xlsx

Go library for reading and writing XLSX files.
Other
5.84k stars 816 forks source link

Border style not allowed to be empty #211

Closed djui closed 7 years ago

djui commented 8 years ago

https://github.com/tealeg/xlsx/commit/27f0f707d42799573616e9166c4be707dd4d2621 introduced a fix for merged cells AFAIK but introduced a regression which created borders styles with attribute empty string, which is not definition compliant. Value must be one of {'double', 'mediumDashDot', 'mediumDashDotDot', 'hair', 'dashDot', 'slantDashDot', 'medium', 'thick', 'mediumDashed', 'dashDotDot', 'thin', 'dotted', 'dashed'}

tealeg commented 8 years ago

This should now be resolved on the master branch.

djui commented 8 years ago

https://github.com/tealeg/xlsx/commit/7ffa59c7ac2676c3bff57e8cef15c18391545b97 does not fix the problem. Instead of removing the style, it sets it (from style="") to style="none". Neither value are valid. One example:

❯ python3 -c "from openpyxl import load_workbook; wb2 = load_workbook('output-master.xlsx')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/excel.py", line 203, in load_workbook
    parsed_styles = read_style_table(archive)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 181, in read_style_table
    p.parse()
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 50, in parse
    self.border_list = IndexedList(self.parse_borders())
  File "/usr/local/lib/python3.5/site-packages/openpyxl/utils/indexed_list.py", line 17, in __init__
    for idx, val in enumerate(iterable):
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 99, in parse_borders
    yield Border.from_tree(border_node)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 65, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 78, in from_tree
    return cls(**attrib)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/styles/borders.py", line 46, in __init__
    self.style = style
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/base.py", line 143, in __set__
    super(NoneSet, self).__set__(instance, value)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/base.py", line 128, in __set__
    raise ValueError(self.__doc__)
ValueError: Value must be one of {'thick', 'hair', 'mediumDashDotDot', 'thin', 'dashDot', 'dashDotDot', 'dashed', 'mediumDashDot', 'medium', 'double', 'dotted', 'mediumDashed', 'slantDashDot'}
djui commented 8 years ago

Some more context and examples:

053c240.xlsx 27f0f70.xlsx 7ffa59c.xlsx

$ python3 -c "from openpyxl import load_workbook; wb2 = load_workbook('053c240.xlsx')"

$ python3 -c "from openpyxl import load_workbook; wb2 = load_workbook('27f0f70.xlsx')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/excel.py", line 203, in load_workbook
    parsed_styles = read_style_table(archive)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 181, in read_style_table
    p.parse()
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 50, in parse
    self.border_list = IndexedList(self.parse_borders())
  File "/usr/local/lib/python3.5/site-packages/openpyxl/utils/indexed_list.py", line 17, in __init__
    for idx, val in enumerate(iterable):
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 99, in parse_borders
    yield Border.from_tree(border_node)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 65, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 78, in from_tree
    return cls(**attrib)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/styles/borders.py", line 46, in __init__
    self.style = style
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/base.py", line 143, in __set__
    super(NoneSet, self).__set__(instance, value)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/base.py", line 128, in __set__
    raise ValueError(self.__doc__)
ValueError: Value must be one of {'medium', 'dotted', 'mediumDashed', 'dashDotDot', 'hair', 'thin', 'double', 'dashed', 'mediumDashDot', 'slantDashDot', 'dashDot', 'thick', 'mediumDashDotDot'}

$ python3 -c "from openpyxl import load_workbook; wb2 = load_workbook('7ffa59c.xlsx')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/excel.py", line 203, in load_workbook
    parsed_styles = read_style_table(archive)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 181, in read_style_table
    p.parse()
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 50, in parse
    self.border_list = IndexedList(self.parse_borders())
  File "/usr/local/lib/python3.5/site-packages/openpyxl/utils/indexed_list.py", line 17, in __init__
    for idx, val in enumerate(iterable):
  File "/usr/local/lib/python3.5/site-packages/openpyxl/reader/style.py", line 99, in parse_borders
    yield Border.from_tree(border_node)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 65, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/serialisable.py", line 78, in from_tree
    return cls(**attrib)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/styles/borders.py", line 46, in __init__
    self.style = style
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/base.py", line 143, in __set__
    super(NoneSet, self).__set__(instance, value)
  File "/usr/local/lib/python3.5/site-packages/openpyxl/descriptors/base.py", line 128, in __set__
    raise ValueError(self.__doc__)
ValueError: Value must be one of {'slantDashDot', 'thick', 'thin', 'dotted', 'double', 'mediumDashed', 'dashDotDot', 'hair', 'dashed', 'medium', 'dashDot', 'mediumDashDotDot', 'mediumDashDot'}
tealeg commented 8 years ago

@djui the ECMA-376 standard for Office Open XML (and the XML stylesheets provided with it) define ST_BorderStyle with "none" as a valid value. That makes me think this must be a bug in openpyxl.

pwaller commented 8 years ago

Filed on openpyxl.

tealeg commented 8 years ago

@pwaller good stuff!

pwaller commented 8 years ago

@tealeg openpyxl just replied and refused to change it because apparently they interpret the spec to mean "="none" implies 'missing'". Any chance we can fix it in light of this fact? I've asked if they can reconsider and point at the spec, but unless one side relents then these two nice pieces of software won't be able to interoperate.

Themanwithoutaplan commented 8 years ago

Blame the specification for unnecessary ambiguity. Support for this wart requires special casing which adds nothing to the result. No attribute should be serialised as "none" unless this is required (makes no sense for border sides styles).

pwaller commented 8 years ago

@tealeg openpyxl appears unwilling fix the problem their side and says they have filed issues with the spec. Any chance of revisiting it on your side in the mean time? Otherwise we can't produce files that people are consuming using openpyxl.

tealeg commented 8 years ago

@pwaller I'll reopen the issue. It's somewhat trickier than it sounds to fix it, but maybe someone is up for a challenge.

pwaller commented 8 years ago

Thanks @tealeg!

pwaller commented 8 years ago

@tealeg looking into it briefly myself, do you have any idea why omitempty does not do what I thought it would?

pwaller commented 8 years ago

To answer my own question, it feels a bit like it must be a bug in encoding/xml where attr and omitempty don't play well together.

Apparently MarshalerAttr sounds like it enables serialising with the attribute removed:

If MarshalXMLAttr returns the zero attribute Attr{}, no attribute will be generated in the output. MarshalXMLAttr is used only for struct fields with the "attr" option in the field tag.

I guess the fix would then be to make the Style value a type alias of string which implements MarshalXMLAttr and returns xml.Attr{} for the empty string. Does that sound right to you?

pwaller commented 8 years ago

Oh, it also sounds like instead the field could be made into a pointer.

I guess both of those approaches may have problems with preserving backwards compatibility.

tealeg commented 8 years ago

@pwaller I've tried the pointer approach - it works but you'll trigger a lot of changes elsewhere in the code. There are a number of small issues like this resulting from encoding/xml. Custom marshalling and unmarshalling using the encoding/xml library is the best approach, but also lengthy to achieve - again, I've looked at that for other reasons, it's a pain to start, but would be well worth it in the long run.