willtrnr / pyxlsb

Excel 2007+ Binary Workbook (xlsb) reader for Python
GNU Lesser General Public License v3.0
90 stars 21 forks source link

how to differentiate float, date, time and datetime cells? #10

Open chfw opened 6 years ago

chfw commented 6 years ago

Thanks for this great library which fill up the gap in the world of python + excel.

My question is: how to find out date and time fields among floats? with xls and xlsx readers from Python, I was able to tell which cell has date, time and float values.

Thanks!

chfw commented 6 years ago

Here's the simple test case: read it and expect them to be in proper python date, time objects. Note, formatting( dd/mm/yyyy, mm/dd/yy ) is not checked here.

Date | Time
-- | --
25/12/14 | 11:11:11
26/12/14 | 12:12:12
01/01/15 | 13:13:13
00/01/00 | 0:00:00
willtrnr commented 6 years ago

There's no distinction between a date, time, datetime and number in the file format, the only way to figure it out is through cell formatting.

I've been working on formulas lately, but I can switch to styles instead.

chfw commented 6 years ago

If you can accommodate style, then we can have the momentum to push a basic xlsb reader out. I have a test suite to certify its usability.

necnec commented 5 years ago

Hi @wwwiiilll ! Do you have any plans for implementing autodetecting datetime cells? I'm into integration this lib with pandas and this is the issue there.

I've tried to understand how to connect styles (where cell format info is stored) and cells but I found MS-XLSB docs too difficult. If you can give more info about how to do it there I could prepare a PR.

Thanks!

willtrnr commented 5 years ago

@necnec yes, I do intend on incorporating styles, or at the very least date format detection at some point, but seeing how I haven't had much time to spend on this a PR is more than welcome.

At this point I have basic stubs of the records relevant to style (relevant MS-XLSB sections are 2.1.7.50 and 2.2.6), but I still have to figure out how to model the runtime structure and its API.

At least we don't actually need to parse everything just to figure out if the cell contains dates, but the minimum is the handling of sheet/row/column/cell XF stacking and format strings.