weshatheleopard / rubyXL

Ruby lib for reading/writing/modifying .xlsx and .xlsm files
MIT License
1.27k stars 254 forks source link

Full support for reading font and fill colors #116

Open kinkou opened 10 years ago

kinkou commented 10 years ago

As far as I managed to figure out, basic methods for font and fill color retrieval in RubyXL currently only return value of rgb attribute from /styleSheet/fonts/font[n]/color (that is for font colors) from styles.xml, if it is present, otherwise they just return '000000', which, unfortunately, is not correct in many cases. There are (I suppose you know even better) 4 more different ways that Excel uses to save and reproduce colors, as I learned from this answer on StackOverflow.

In my project I need to read colors from XLSX a lot, so I wrote a wrapper on top of RubyXL to retrieve them. However, having the fore-mentioned support inside the gem would be so much better. Hence the question: are there any plans about the subject? If not, could you share some ideas about how it would be better to contribute (if appropriate)?

weshatheleopard commented 10 years ago

The currently present methods are inherited from version 1 which was not sophisticated at all. Now when the file is completely read, we can do whatever we want. I have not implemented any of complex accessors because there were more pressing needs, like reading spreadsheet values.

If your wrapper resides on github, give me a link, and I will try to integrate it into the gem.

It should be noted that generally we would want to provide write access whenever we provide read access, and that might be quite complex.

kinkou commented 10 years ago

So last few days I've been reading the specification, exploring the gem and finishing my wrapper. As I have no need to generate xlsx files, I've been focusing only on implementing a way to read colors properly. Here's what I've done so far:

For cell fills:

For pattern fills:

For solid types of pattern fills, they can be stored in the following attrs of Color object:

For other types of pattern fills, the wrapper will return an array, containing the foreground color, the background color, and the name of the pattern.

For font colors:

Also, a tint attribute (if present) is applied to the color (if it was read from either rgb, indexed, of theme attributes), according to the rules specified in 18.8.19 of ECMA-376.

I guess this covers 80% of what a user might want when it comes to reading colors, and it's better than what we have today. So if you think it's worth including this in the gem, I'll provide the source code.

weshatheleopard commented 10 years ago
  1. For 'auto', I think it should return whatever the default color is in the general Excel style (yes, 99% it will be white). Same for the font.
  2. Please remember that you should be basing your code off comment_support branch, which is in a release candidate state, and constitutes a major rework.
  3. By all means, please do submit your code, it will be very helpful.
kinkou commented 10 years ago

Ok, this is how it looks now: https://gist.github.com/rastarobot/2b742e97947c901b8cf0 It is based on the comment_support branch, but I think it doesn't use any features specific to it. The module is actually a mixin for a proxy that I use in my project to provide unified access to spreadsheet files no matter what format they came in.

Is the default color (for 'auto') you mentioned located in [0] of <fills> inside styles.xml?

weshatheleopard commented 10 years ago

I do not know for sure but I would expect so.

caramdache commented 4 years ago

@weshatheleopard, was this ever implemented? I am running into the same issues as @kinkou and have developed some rudimentary code, but am progressively discovering that the issue is much more complex that I thought. Any code or advice would be much appreciated!

@kinkou, your gist is no longer available. Any chance you might republish?

caramdache commented 4 years ago

OK I have since found:

which seem to fit my needs.

caramdache commented 4 years ago

There is an issue in get_rgb though. I have about 100 spreadsheets that I parse. In some of them, get_rgb return an rgb value that misses 1 hex value.

For example:

weshatheleopard commented 4 years ago

It is not immediately obvious to me how it is incorrect. What do you mean by "misses 1 hex value"?

caramdache commented 4 years ago

A color would normally be of the form: #xxyyzz (6 hexa characters).

Here, there are only 5 hexa characters: #e46ca.

weshatheleopard commented 4 years ago

Fixed, released.

P.S. The proper bug report should have been: "If one of the RGB components is < 16, the leading '0' of that component is mistakenly omitted".

caramdache commented 4 years ago

Marvellous, I confirm that it works! I wished that I had had the insight of what the bug was really about to make a proper bug report. I'm sorry about that.

There is a further issue:

First, it would seem to make sense to follow the same return format for theme colors.

Second, I'm trying to generate HTML and, as far as I understand, the expected format is slightly different: rrggbbaa (see https://alligator.io/css/hex-code-colors-alpha-values/). Would there be any possibility to change the return format to be more HTML compatible? Or, not to break existing code, to add a get_rgb_html method?

I'm not a pro about colour, so I apologize in advance for any misunderstanding of the specification.

caramdache commented 4 years ago

BTW, I'm writing code to generate HTML from an Excel worksheet. I started with openpyxl and pandas, but there were too many limitations, in particular in handling merged cells, colors and runs, and I decided to switch to your library which is a great choice I've found.

I'd be happy to share that code once I've done. If you were to feel that it would be worth integrating in rubyXL, that would be fine by me.

caramdache commented 4 years ago

A further issue is the resulting rgb color can be either uppercase or lowercase. That could be dealt with in application code, but it might be safer to convert all colors in lowercase or uppercase in the library.

caramdache commented 4 years ago

Yet a further point where your advice would be beneficial. I have just come accross a situation where some of the formatting (color) is lost in the translation to HTML. This boils down to font_color and get_cell_font.color returning different values:

irb(main):027:0> c=wb.worksheets[0][43][4]
irb(main):028:0> c
=> #<RubyXL::Cell(43,4): "56", datatype="s", style_index=40>
irb(main):029:0> c.font_color
=> "000000"
irb(main):030:0> c.get_cell_font.color.get_rgb(c.worksheet.workbook)
=> "1f497d"

But for another cell there is no such issue, the rgb values are the same:

irb(main):023:0> c=wb.worksheets[0][44][4]
irb(main):024:0> c
=> #<RubyXL::Cell(44,4): "58", datatype="s", style_index=43>
irb(main):025:0> c.font_color
=> "FFFF0000"
irb(main):026:0> c.get_cell_font.color.get_rgb(c.worksheet.workbook)
=> "FFFF0000"

Digging further, this seems to occur when color.rgb is nil and a theme color is used instead:

irb(main):035:0> c.get_cell_font.color
=> #<RubyXL::Color:0x000055b18cd2a580 @local_namespaces=[], @auto=nil, @indexed=nil, @rgb=nil, @theme=3, @tint=nil>
weshatheleopard commented 4 years ago

There is a further issue

That's working as expected: Excel stores colors with Alpha in AARRGGBB format. I just verified that.

RubyXL is not obligated to return the data in the format you want. It is obligated to give you a way to access the data you want. In this particular case, you have access to that data; converting it to the format you desire is your job as a programmer. (For example: if str.length == 8 then result = color[3...8] + color[0...2] else str end That code is specific to the task you have at hand, so there's no reason it should be in the gem: other people may have different needs.

A further issue is the resulting rgb color can be either uppercase or lowercase. That could be dealt with in application code, but it might be safer to convert all colors in lowercase or uppercase in the library.

As they say, "when you are holding a hammer, all problems tend to look like nails". Imagine I add uppercasing to the library: your problem might solved because you don't have to write one toupper statement, but what about OTHER people, who might want to receive the code exactly as it is in the excel file and are no longer able to because the library already uppercased the string?

Yet a further point where your advice would be beneficial.

To investigate that issue, I will need the file you are looking at.

caramdache commented 4 years ago

To investigate that issue, I will need the file you are looking at.

I cannot share this particular file unfortunately. However, I believe I know where the issue lies: in the font_color convenience method calls:

    get_cell_font.get_rgb_color

which in turns calls get_rgb_color:

    color && color.rgb

which does return nil when the color is defined as a style.

I am under the impression an appropriate definition for font_color would be:

def font_color()
  validate_worksheet
  get_cell_font.color.get_rgb(worksheet.workbook) || '000000'
end

This always return a color, whether defined as a style or at the cell level.