xlnt-community / xlnt

:bar_chart: Cross-platform user-friendly xlsx library for C++11+
https://xlnt-community.github.io/xlnt/
Other
14 stars 3 forks source link

Initial commit for supporting reading xls (old) files. #22

Open flaviu22 opened 3 months ago

flaviu22 commented 3 months ago

The xls code is pretty similar with xlsx for the moment, I have prepared the code to implement code to read xls files.

m7913d commented 3 months ago

@flaviu22 Thanks for your contribution. I will conduct a more thorough review later. If the code between xls and xlsx reading is very similar, it may be a good idea to move the common code to a common_consumer/common_producer class, as suggested by tfussells in his comment

doomlaur commented 3 months ago

@flaviu22 Thanks for your contribution! I'll take a closer look at it tomorrow or later this week :wink: However, I took a quick look and I must agree to @m7913d that, if possible, the same code should be used for both types of files (as much as possible, at least) to avoid code duplication.

@m7913d About the unit test: my guess is that he is using Visual Studio, and since XLNT uses localtime for the date/time utils, the C Standard Library of MSVC has the limitation for localtime, gmtime and mktime that it only supports dates between January 1st 1970 00:00:00 UTC and December 31st 3000 23:59:59 UTC. Unfortunately, if the tested time is at midnight and the programmer lives in a time zone with positive UTC offset, this means that the local time at January 1st 1970 midnight UTC is actually a few hours before midnight in the local time, causing the conversion to fail. In these cases, localtime returns -1, explaining the adapted unit test. I had this issue in some projects and had to learn it the hard way...

Just in case you are interested, this can be fixed by using C++20's std::chrono with a time zone database, but there are multiple issues that have been reported to Microsoft (some from myself):

Long story short, we did manage to fix the limitation around 1970-01-01 for some projects using std::chrono, and even the limitation for Windows versions before Windows 10 1903/19H1 and Windows Server 2022 can be worked around (by using localtime as a fallback, of course with its limitation of the years 1970 - 3000), but it's not nice, and for old versions of Windows not really fixable unless using an additional library with a time zone database.

Some things that I would like to mention which are NOT related to this pull request:

  1. We should probably not only use localtime_s on Windows, but also localtime_r on Linux and platforms that support it, and only fall back to localtime if there's no other option. See the documentation for the macros that need to be tested for localtime_r.
  2. We should move the safe_localtime code to some utils namespace, as it is currently implemented twice: once in date.cpp and once in time.cpp.
  3. We should NOT disable the unit test, but rather change it to January 2nd 1970, for example - then we are sure that no time zone should ever cause issues with our unit tests.
m7913d commented 3 months ago

@doomlaur Changing the unit test to January 2nd 1970 is fine for me.

m7913d commented 2 months ago

@flaviu22 What is the progress status of this xls implementation. It seems to me that most of the code is a copy paste from the xlsx version, without much of the changes required to actually read the old (binary) xls format? Do you plan to work further on the xls implementation? Should we convert this PR to a draft for now?

m7913d commented 2 months ago

I converted this PR to a draft, as the current implementation seems to be a WIP. Feel free to mark this PR as "Ready for review", once you think your code is ready for inclusion into the project.

@flaviu22 If you need further assistance to implement the xls support, please let us now.

flaviu22 commented 2 months ago

I converted this PR to a draft, as the current implementation seems to be a WIP. Feel free to mark this PR as "Ready for review", once you think your code is ready for inclusion into the project.

@flaviu22 If you need further assistance to implement the xls support, please let us now.

Hi all. Yes, good to converted as draft, until we have a ready to integrate code.

"_it may be a good idea to move the common code to a common_consumer/commonproducer class" Yes, indeed, but I would do that only after I'll have a functional code for xls branch, ok?

I'll come back with comments ...

m7913d commented 2 months ago

"_it may be a good idea to move the common code to a common_consumer/commonproducer class" Yes, indeed, but I would do that only after I'll have a functional code for xls branch, ok?

Off course. When I posted that comment, I hadn't fully reviewed the code and assumed it was already functional.

doomlaur commented 1 month ago

@flaviu22 I would appreciate if you could:

  1. Revert your change to the unit test in datetime_test_suite.cpp.
  2. Merge the master branch to your branch again.

This is because Pull Request #27 fixes the issue you had with the unit test, so you should no longer see the failed unit test for weekday() under Visual Studio.

flaviu22 commented 1 month ago

I converted this PR to a draft, as the current implementation seems to be a WIP. Feel free to mark this PR as "Ready for review", once you think your code is ready for inclusion into the project.

@flaviu22 If you need further assistance to implement the xls support, please let us now.

Where can I find documentation about how to read xls files? To be honest, I didn't find such a thing.

On my version of code, I have to change how excel file is read from early beginning:

void xls_consumer::open(std::istream &source)
{
    archive_.reset(new izstream(source));
    populate_workbook(true);
}

Here, archive_ doesn't make sense as xls file is not an archive anymore.

m7913d commented 1 month ago

It may be useful to have a look at the official documentation: https://learn.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/cd03cb5f-ca02-4934-a391-bb674cb8aa06

m7913d commented 1 month ago

If you haven't done so, I highly suggest to read https://github.com/tfussell/xlnt/issues/227#issuecomment-331893831

yacsha commented 1 month ago

Thank you for your willingness to maintain this great library. I would like to tell you that there is a very powerful library to work with xls files in C++, which I recommend, it is called ExcelFormat and was created by Martin Fuchs, but it was only maintained until 2011 and then it fell into abandonment, however some enthusiasts in the community have been maintaining and updating it, one of the most popular derived projects being that of dmytro-y-dev. It would be good if, in order not to start from scratch, what has already been advanced in this project could be reused and adapted to the Xlnt API. The author based his code on this OpenOffice documentation.

Regards