randym / axlsx

xlsx generation with charts, images, automated column width, customizable styles and full schema validation. Axlsx excels at helping you generate beautiful Office Open XML Spreadsheet documents without having to understand the entire ECMA specification. Check out the README for some examples of how easy it is. Best of all, you can validate your xlsx file before serialization so you know for sure that anything generated is going to load on your client's machine.
MIT License
2.62k stars 695 forks source link

Times and Time zones #188

Closed agardiner closed 10 years ago

agardiner commented 11 years ago

I'm having issues with times not displaying correctly in Excel 2007. My local timezone is Europe/London, which is currently 1 hour ahead of UTC time due to daylight savings.

When I create a Time object in Ruby with my timezone set using Time.now, it displays correctly, e.g.

2013-05-15 08:50:39 +0100

When this value is set as the value of a cell, and the workbook serialized and opened in Excel with a format style of 'hh:mm:ss', the time displays as:

07:50:39

After googling this it seems Excel doesn't really handle timezones, and serial time values are always assumed to be in local time. The only reference to this I could find in the spec was in section 18.17.4.1:

The base dates and the related serial date-times represent local date and time.

Based on the above, when a Time value is converted to a serial value by the DateTimeConverter, the serial time ends up being in UTC rather than local time, which explains why my times are not displaying correctly when I view them in Excel.

I therefore experimented with changing line 26 of date_time_converter.rb as follows:

      (time.to_f + time.utc_offset - epoch)/seconds_per_day

This seems to give me the right results when I open the workbook in Excel in my timezone. However, this will clearly be wrong for users in any timezone other than the one in which the spreadsheet was created - and yet, that seems to be normal for Excel!?

Am I missing something completely obvious here, or is this really what is needed to get Excel to display times in local time? Does this same issue exist in other products like LibreOffice?

BTW I experimented using the iso_8601 type, but couldn't get that to work. Excel complained of invalid content, and stripped the date values out of my workbook.

randym commented 11 years ago

Hi @agardiner

Excellent report!

I say the first step is get that iso_8601 type working. There is a small chance that it was not supported yet in 2007, but I really doubt it. More likely we are improperly serializing the type.

knee deep in some work stuff at the moment, but I will, as soon as I can dig in a bit.

I dont think hitting a utc offset in there is really what we want. Much better to feed excel the date info with timezone and let it sort it out in my opinion.

agardiner commented 11 years ago

Hi @randym

I've been doing some more research, and it appears Excel really is oblivious to timezones; all dates and times are considered local, and no time zone conversion is performed. Apparently, date/time handling is based on what Lotus 1-2-3 did before Excel was even developed... right down to mimicing a bug that occurred in Lotus where 1900 was mistakenly considered a leap year.

This may have changed somewhat with the addition of support for ISO 8601 dates in the XML file formats, but as I found earlier, apparently valid ISO 8601 date/time stamps are rejected by Excel 2007. I wonder if this might be because of the timezone component in the timestamp I was generating. I will try tomorrow testing with an ISO 8601 date that excludes the timezone offset and see if that works. If it does, it would indicate that even ISO 8601 times don't support timezone offsets.

Regardless of how that pans out, it does seem to me that if a Ruby Time value is being converted to an Excel serial date (not an ISO8601 date), it should be done based on the local time, rather than the UTC time as it is done at present. Otherwise the time you have in Ruby does not equal the time you see in Excel. This forces you to either adjust the time in Ruby (e.g. turning 06:00 UTC / 07:00 local into 07:00 UTC / 08:00 local, so that it displays as 07:00 in Excel), or perform a calculation in Excel to go from UTC to local time by adding the timezone offset back.

Do you see what I mean?

agardiner commented 11 years ago

I've done some more testing with the ISO 8601 date type, and come to the conclusion that this is simply not supported in Excel 2007. This seems to be a change that was added in the 2012 4th edition of ECMA-376, so perhaps that should not be surprising; from D.2 in part 4 of the 4th edition spec:

The ST_CellType simple type (Part 1, §18.18.11) now allows an enumeration value of d

I therefore tried this on Excel 2011 for the Mac, and had a little more success there:

2013                        1900-01-00 20:13:00         Failed
2013-07                     2013-07                     Failed
2013-07-29                  2013-07-29 00:00:00         OK
20130709                    2013-07-09 00:00:00         OK
2013-W26                    2013-W26                    Failed
2013W17                     2013W17                     Failed
2013W136                    2013W136                    Failed
2013-W06-3                  2013-W06-3                  Failed
2013-187                    2013-187                    Failed
2013188                     2013188                     Failed
2012-05-13T08:50            2012-05-13 08:50:00         OK
2012-05-14T08:50:39         2012-05-14 08:50:39         OK
2012-05-15T08:59:59Z        2012-05-15 08:59:59         OK
2012-05-16T08:50:39+0600    2012-05-16T08:50:39+0600    Failed
2013-09-16T14:23:56-0300    2013-09-16T14:23:56-0300    Failed
20120515T085959Z            2012-05-15 08:59:59         OK
20120516T085039+0600        20120516T085039+0600        Failed
20130916T142356-03          20130916T142356-03          Failed
20130916T142356-0300        20130916T142356-0300        Failed

As you can see from the above, any ISO 8601 time that includes a timezone offset fails to be converted to a date. This leads me to conclude that even ISO 8601 dates are not timezone aware, and will only be interpreted as local times by Excel.

agardiner commented 11 years ago

I've also done some testing comparing times in Ruby to those in Excel generated by AXLSX. Here is the result:

Current Time (string)   Current Time            Current Time + UTC Offset
2013-05-16 11:24:02     2013-05-16 10:24:03     2013-05-16 11:24:03

As you can see, the time is not represented correctly in Excel unless I add my UTC offset, which is why I think the DateTimeConverter needs to do this to give the correct result.

The above was generated by the following code, if you'd like to try it yourself:

require 'axlsx'

p = Axlsx::Package.new
p.workbook do |wb|
    date_time = nil
    wb.styles do |s|
        date_time = s.add_style :format_code => 'yyyy-mm-dd hh:mm:ss'
    end

    wb.add_worksheet(:name => 'Time') do |ws|
        t = Time.now
        ws.add_row ['Current Time (string)', 'Current Time', 'Current Time + UTC Offset']
        ws.add_row do |row|
            row.add_cell(t.strftime('%Y-%m-%d %H:%M:%S'), :type => :string)
            row.add_cell(t, :style => date_time)
            row.add_cell(t + t.utc_offset, :style => date_time)
        end
    end
end
p.serialize 'Time.xlsx'

Let me know how you'd like to proceed. I'm happy to submit a pull request with this change if you agree.

Cheers

randym commented 11 years ago

I've spend some time on this last weekend and have confirmed that excel just goes boom on any type of time zone info even in iso1806 in 2011 and 2008 on mac.

I can see two sides here.

One side expects the developer to provide time in utc - with the understanding that excel will always convert it to local time. That is good in that it means axlsx is not going to be doing any magic to your data under the hood, but bad because it requires users to be aware that OOXML is not going to handle timezone info as you expect.

The other side is to just convert everything that comes in into utc and have no surprises output for end users of our data. There are performance implications to consider as well, especially for large data sets.

My experience with global systems has shows that the most stable systems use utc internally and then convert that to the users local time for user facing views, which is also what excel is doing here so I dont feel overly bad about requiring UTC input - but then again, many people dont build global sites and may not be aware....

For now, if I have some time this weekend, I will take a look at the performance hit for converting all inputs to UTC before serialization to "excel-crazy-time" and see what the damage is.

agardiner commented 11 years ago

Some of what you've said here doesn't gel for me...

My experience with global systems has shows that the most stable systems use utc internally and then convert that to the users local time for user facing views...

I 100% concur with this statement...

...which is also what excel is doing here

But this part doesn't match what I found. What I've found is that Excel does absolutely no time zone adjustment, and assumes all times are already in the desired timezone. Thus if you give it a time in UTC, it will display the time in UTC, but if you give it a time in local time, it will display it in local time.

This surprised me, because like you I expected that Excel would understand UTC and convert this to the user's local time zone - but this doesn't seem to be the case. Excel never adjusts a time value it has been given.

The problem I have with the current behavior in axlsx is that it is presenting data to Excel in UTC, which means that Excel always displays them in UTC: screen shot 2013-06-07 at 10 12 11

I think the more natural approach is to default to presenting data in local times, since in most cases a spreadsheet will be opened in the same time zone in which it was generated. It may make sense to also have an option to use UTC times, but I don't think that should be the default.

randym commented 11 years ago

You are absolutely correct. Excel just drops the ball on any timezone stuff.

To confirm your proposal:

If the Time, DateTime specified by the consumer of the library has a time zone offset we automatically adjust the serialized time with the expectation that the zone of the data used to generate the sheet is already in the time zone that the end user is going to want to see.

Your earlier comments hint at this kind of solution, I just want to confirm if that the kind of thing you are looking for before we fix the direction and do a major version release as it will break our backwards compatibility (for very good reasons I think)

noniq commented 11 years ago

+1 for automatically adjusting time by adding the time zone offset! :+1:

(I’m facing this issue in an application I’m currently working on – would be great to get rid of the manual adjustment that is necessary right now.)

agardiner commented 11 years ago

Yes, that is exactly what I am proposing. I changed line 26 of date_time_converter.rb as follows:

(time.to_f + time.utc_offset - epoch)/seconds_per_day

That gives me time values as I expect to see them (i.e. in the local time of the machine on which the .xlsx is generated)

noniq commented 11 years ago

When using Axlsx with Rails, date_to_serial must also be changed.

In my projects I’m currently using a Rails initializer to make Axlsx time-zone-aware, like this:

# TODO: Remove once Axlsx does time zone conversion itself.
module Axlsx
  class DateTimeConverter
    class << self
      module RespectTimezoneInConversion
        def date_to_serial(date)
          date += date.utc_offset.seconds if date.respond_to?(:utc_offset)
          super(date)
        end

        def time_to_serial(time)
          super(time + time.utc_offset)
        end
      end

      prepend RespectTimezoneInConversion
    end
  end
end

(Note that this uses Module.prepend which is only available since Ruby 2.0 – for older versions you’d have to rewrite this using alias_method_chain instead).

noniq commented 11 years ago

Note: The initial version of the code I posted above contained a bug in date_to_serial – I just edited the code to fix that bug.

randym commented 10 years ago

This is fixed in 2.0.0, which is going to be release within the hour unless I find some kind of major problem.

EmilyMB commented 6 years ago

@agardiner 👋 I know you were mostly concerned about the timezone here, but did you figure out why things were off by 1 second? I'm wondering if there is some better way of handling these dates I'm missing...

agardiner commented 6 years ago

No, to be honest I wasn't concerned about the difference of a second, it was the hours/mins that were a problem.

RolandStuder commented 6 months ago

Late comment, in case someone comes ends of here being confused about sometimes the seconds looking off.

I beliveve, when writing a date time to an XLS the time gets rounded to seconds. So you loose the subseconds information, and in some situation due to rounding you can then be off by 1 second.