qrilka / xlsx

Simple and incomplete Excel file parser/writer
MIT License
128 stars 62 forks source link

Handling dates before 1900-03-01 #145

Closed MarceloZabini closed 2 years ago

MarceloZabini commented 2 years ago

Hi,

It seems even though 1900 is not a leap year, it is treated as such in Excel (see https://en.wikipedia.org/wiki/Microsoft_Excel#Fictional_leap_day_in_the_year_1900). The wikipedia page also mentions this has been carried over to the Open XML file format.

This means 1900-02-29 is a valid date in Excel, and dates before 1900-03-01 are all shifted one day ahead when they're generated by xlsx. I'm not sure what the current behaviour is when converting from Excel.

Do I understand this correctly and is it a bug in xlsx?

qrilka commented 2 years ago

In xlsx there is only a couple of helpers to convert dates to numbers but I guess that you're right and we don't have this "feature" of Excel supported correctly :) I'd be glad to review a PR fixing this.

MarceloZabini commented 2 years ago

Thanks for such a quick reply.

At our shop we're currently using a forked xlsx that you can see at https://github.com/proda-ai/xlsx/pull/1/files. I'm not sure it fixes the problem in the right place, and it doesn't yet handle conversion from excel to UTCTime. I can submit that PR with additions that fix the other direction, and I was thinking of adding those to dateFromNumber. Are there other or better places to fix this?

I'm also not sure what we should do about numbers in excel range [1900-2-29 00:00, 1900-2-29 23:59:59...]. I think one possibility is to convert all numbers in that range into UTCTime 1900-3-1 00:00, which would at least make dateFromNumber a (non-strictly) increasing function - maybe not a requirement but probably a sensible property?

qrilka commented 2 years ago

Sounds OK to me, just needs to be documented at least :)