weshatheleopard / rubyXL

Ruby lib for reading/writing/modifying .xlsx and .xlsm files
MIT License
1.28k stars 255 forks source link

`is_date` implementation seems out of date #462

Open caramdache opened 2 months ago

caramdache commented 2 months ago

I've been having issues with certain cell values being recognised as dates because of the cell format was set as a custom date format, but the cell content wasn't really a date and was not supposed to be consummed as a date.

For example, the string '1-23, stored as the sring 133, was being considered as the date 23/09/2024, with created a number of issues for downstream processing, whereas it should have been considered as the range [1, 23], expressed as a string.

While investigating this issue, I've come across this nice writeup, which indicates that this is what used to happen in "compatibility" mode, during the transition from .xlx to .xlsx, but that in modern, newly created .xlsx files, the behaviour should be different. And that the cell should only be considered a date cell, if the cell is of type date ("d"), irrespectively of whether the format is a date format.

http://www.ericwhite.com/blog/dates-in-strict-spreadsheetml-files/

By looking at the current implementation of the is_date method, it looks like it is implementing the "compatibility" mode and not the "strict", modern mode. It would be great if the code could be updated to support the "strict" mode.

https://github.com/weshatheleopard/rubyXL/blob/b719e4337a80aa78000bea1a0569cbfeb7783b4f/lib/rubyXL/objects/sheet_data.rb#L93

caramdache commented 2 months ago

Here is my suggested change:

    def is_date?
      return true if datatype == RubyXL::DataType::DATE # From Office 2010

      return false unless # Only fully numeric values can be dates
        case raw_value
        when Numeric then true
        when String  then raw_value =~ NUMBER_REGEXP
        else false
        end

      self.number_format&.is_date_format?
    end
weshatheleopard commented 2 months ago

Can you please provide an example file which doesn't work the way it should? It looks like the issue here is more complex than you think, and I need an actual file to test it on.

caramdache commented 2 months ago

Here are 2 simple test files. I discovered on this occasion that, to set strict mode, you have to go to File/Options/Save. This is different from Word, which displays a dialog box on save. So it may be that most .xlsx files are actually saved in compatibility mode.

test-strict.xlsx test_compatibility.xlsx