troldal / OpenXLSX

A C++ library for reading, writing, creating and modifying Microsoft Excel® (.xlsx) files.
BSD 3-Clause "New" or "Revised" License
1.33k stars 313 forks source link

So many bugs, is it REALLY useful? #243

Open genyez opened 4 months ago

genyez commented 4 months ago

or just a toy? Bugs including:

I do fixed it in my repo, but I'm really curious. Does ANYONE use it for production for real?

If anyone need those mods/fixes, let me know


Don't be mistaken. I highly appreciate the authors about this project. Just curious about why nobody even try to fix them.

CEXT-Dan commented 4 months ago

I use it as a static library in one of my C++ programs, for read-only though I have unit tests for my data and it passes.

Shame the project went stale, someday I might want to write data

genyez commented 4 months ago

I use it as a static library in one of my C++ programs, for read-only though I have unit tests for my data and it passes.

Shame the project went stale, someday I might want to write data

Well, read-only & C++ only could be a workable situation, I'll admit that.

aral-matrix commented 1 month ago

Shame the project went stale, someday I might want to write data

Project is no longer stale - there's been a huge overhaul / addressing pull requests, code refactoring & new features. More features are in work (styles support).

CEXT-Dan commented 1 month ago

Shame the project went stale, someday I might want to write data

Project is no longer stale - there's been a huge overhaul / addressing pull requests, code refactoring & new features. More features are in work (styles support).

Thank you for your contribution! I have a fork, that I edited to read styles. I added XLStyles, because I needed access to the number format type, I.e. Currency(currency symbol), Percent, Date format and number precision.

CEXT-Dan commented 1 month ago

I use this package as a Field( Text ) evaluator for AutoCAD. In CAD, number format is important It would be nice to have the option to change the cell value from AutoCAD, without the need to open excel. It’s currently read only

xls

aral-matrix commented 1 month ago

I use this package as a Field( Text ) evaluator for AutoCAD. In CAD, number format is important It would be nice to have the option to change the cell value from AutoCAD, without the need to open excel. It’s currently read only

That looks..... colorful :D to be honest, I have no idea what I am looking at - last time I used AutoCAD was about 20 years ago :D

Thank you for your contribution! I have a fork, that I edited to read styles. I added XLStyles, because I needed access to the number format type, I.e. Currency(currency symbol), Percent, Date format and number precision.

Maybe I should have a look at that to see what functionality is missing from my implementation. I'll have a look at your fork tomorrow or so - it's really late here.

genyez commented 1 month ago

Shame the project went stale, someday I might want to write data

Project is no longer stale - there's been a huge overhaul / addressing pull requests, code refactoring & new features. More features are in work (styles support).

Thank you for your contribution! Looking forward to them : )

aral-matrix commented 4 weeks ago

creating useless cell nodes even in a read-only procedure

Btw - since I am done with my styles implementation (will eventually be available here) - may I ask what part of the code you are referring to? Are you actually seeing XML nodes being created, or just instances of XLCell? The whole philosophy of OpenXLSX is to directly operate on the underlying XML document as much as possible, so the objects to access data (e.g. XLCell, XLWorksheet etc) are instantiated upon accessing them - but will be destroyed again once going out of scope. The trick here is that the initialization for most objects is very fast & cheap, as it only links the object instance to the underlying relevant (pugi) XML node, then performs operations on that.

So typically, a read-only access will create e.g. an XLCell instance and use that object's methods to access the desired cell properties - but it will not modify the underlying XML or do more than copy the pugixml node object pointer.

A possible exception is that for some properties of cells, some recent additions create those properties upon read access when they do not yet exist. The idea is that "if the user consciously accessed this property, they want this property of the cell to exist" - so a property is often initialized with the default value before that is then returned.

genyez commented 3 weeks ago

creating useless cell nodes even in a read-only procedure

Btw - since I am done with my styles implementation (will eventually be available here) - may I ask what part of the code you are referring to? Are you actually seeing XML nodes being created, or just instances of XLCell? The whole philosophy of OpenXLSX is to directly operate on the underlying XML document as much as possible, so the objects to access data (e.g. XLCell, XLWorksheet etc) are instantiated upon accessing them - but will be destroyed again once going out of scope. The trick here is that the initialization for most objects is very fast & cheap, as it only links the object instance to the underlying relevant (pugi) XML node, then performs operations on that.

So typically, a read-only access will create e.g. an XLCell instance and use that object's methods to access the desired cell properties - but it will not modify the underlying XML or do more than copy the pugixml node object pointer.

A possible exception is that for some properties of cells, some recent additions create those properties upon read access when they do not yet exist. The idea is that "if the user consciously accessed this property, they want this property of the cell to exist" - so a property is often initialized with the default value before that is then returned.

Sorry for the rude word I gave, apology for that if you felt unhappy, sincerely. I'm referring to the code in the function [getCellNode] in XLUtilities.cpp, line 172, and simialar lines below. https://github.com/troldal/OpenXLSX/blob/06425d541d09907799e93a4226691126bd60af5e/OpenXLSX/sources/utilities/XLUtilities.hpp#L172 If I don't take it wrong, when I call range() to get values in cells for a sparse xlsx, getCellNode will create new pugi nodes, which I don't see much necessity. Sparse xlsx, in other words, including many empty cells, is quite common in my application scenario. Let me know if there is any misunderstanding above. Thanks!

aral-matrix commented 3 weeks ago

I understand - and you are correct! Simply iterating over a range of cells like so

   XLCellRange testRange = wks.range("A1:F100");
    for( auto it : testRange );

is enough for all those elements to be created. I agree this is undesirable & will think about a solution. May I ask what your use case is? I.e. - would it be enough for you if for a range iterator, you have an "empty()" method to test if a cell exists, and the moment you actually access the cell in any other way (directly testing the value, formula or style), it gets created like it currently happens?

aral-matrix commented 3 weeks ago

So, I worked on a solution that allows you to iterate manually over the XLCellIterator from XLCellRange::begin() to ::end() without creating the cells, but a range-for loop like for( auto cell : XLCellRange(...)) will still automatically create all cells in the range, simply because they have to be dereferenced for setting the loop variable.

Would the below example work for you?

    XLDocument doc{}; // create new workbook
    doc.create("./Demo10.xlsx", XLForceOverwrite);
    std::cout << "doc.name() is " << doc.name() << std::endl;

    /*** Test unnecessary creation of XML Cells ***/
    XLWorksheet wks = doc.workbook().worksheet(1);
    XLCellRange testRange = wks.range("A1:F100");

    std::cout << std::endl << std::endl;
    std::cout << "FIRST iteration over " << testRange.address() << ":" << std::endl;
    // NOTE: a manual loop over the range by iterator will allow to test for existing cells before accessing them
    for( XLCellIterator it = testRange.begin(); it != testRange.end(); ++it ) {
        if( it.cellExists() )
            std::cout << "cell with address " << it.address() << " exists" << std::endl;
        else {
            if( Rand32() % 5 == 0 )
                it->value() = "inserted";
            if( it.cellExists() )
                std::cout << "inserted cell with address " << it.address() << std::endl;
        }
    }

    std::cout << std::endl << std::endl;
    std::cout << "SECOND iteration over " << testRange.address() << ":" << std::endl;
    // NOTE: a manual loop over the range by iterator will allow to test for existing cells before accessing them
    for( XLCellIterator it = testRange.begin(); it != testRange.end(); ++it ) {
        if( it.cellExists() )
            std::cout << "cell with address " << it.address() << " exists" << std::endl;
        else {
            if( Rand32() % 5 == 0 )
                it->value() = "inserted";
            if( it.cellExists() )
                std::cout << "inserted cell with address " << it.address() << std::endl;
        }
    }

    std::cout << std::endl << std::endl;
    std::cout << "THIRD iteration over " << testRange.address() << ":" << std::endl;
    // NOTE: range-for will force-create every cell in the range, as it has to dereference the iterator to set the loop variable
    for( auto cell : testRange ) {
        if( not cell.empty() )
            std::cout << "cell exists: " << cell.cellReference().address() << std::endl;
    }

    doc.saveAs("./Demo10.xlsx", XLForceOverwrite);
    doc.close();

Output (shortened):

doc.name() is Demo10.xlsx

FIRST iteration over A1:F100:
inserted cell with address D1
inserted cell with address F5
inserted cell with address D6
inserted cell with address E6
inserted cell with address F6
inserted cell with address F8
inserted cell with address C9
inserted cell with address B10
[..]
inserted cell with address C90
inserted cell with address F91
inserted cell with address A92
inserted cell with address E93
inserted cell with address B94
inserted cell with address A95
inserted cell with address B96
inserted cell with address D96
inserted cell with address C97
inserted cell with address A98
inserted cell with address C98
inserted cell with address D99
inserted cell with address F100

SECOND iteration over A1:F100:
inserted cell with address B1
inserted cell with address C1
cell with address D1 exists
inserted cell with address A3
inserted cell with address D3
inserted cell with address E3
inserted cell with address D4
inserted cell with address F4
inserted cell with address A5
inserted cell with address C5
cell with address F5 exists
inserted cell with address A6
cell with address D6 exists
cell with address E6 exists
cell with address F6 exists
inserted cell with address B7
inserted cell with address D7
inserted cell with address D8
cell with address F8 exists
inserted cell with address B9
cell with address C9 exists
inserted cell with address D9
cell with address B10 exists
inserted cell with address E10
[..]
cell with address C90 exists
cell with address F91 exists
cell with address A92 exists
inserted cell with address A93
inserted cell with address B93
cell with address E93 exists
cell with address B94 exists
cell with address A95 exists
cell with address B96 exists
cell with address D96 exists
cell with address C97 exists
cell with address A98 exists
cell with address C98 exists
inserted cell with address A99
cell with address D99 exists
cell with address F100 exists

THIRD iteration over A1:F100:
cell exists: A1
cell exists: B1
cell exists: C1
cell exists: D1
cell exists: E1
cell exists: F1
cell exists: A2
cell exists: B2
cell exists: C2
cell exists: D2
cell exists: E2
cell exists: F2
cell exists: A3
cell exists: B3
cell exists: C3
cell exists: D3
cell exists: E3
cell exists: F3
cell exists: A4
cell exists: B4
cell exists: C4
cell exists: D4
cell exists: E4
cell exists: F4
cell exists: A5
cell exists: B5
cell exists: C5
cell exists: D5
cell exists: E5
cell exists: F5
cell exists: A6
cell exists: B6
cell exists: C6
cell exists: D6
cell exists: E6
cell exists: F6
cell exists: A7
cell exists: B7
cell exists: C7
cell exists: D7
cell exists: E7
cell exists: F7
cell exists: A8
cell exists: B8
cell exists: C8
cell exists: D8
cell exists: E8
cell exists: F8
cell exists: A9
cell exists: B9
cell exists: C9
cell exists: D9
cell exists: E9
cell exists: F9
cell exists: A10
cell exists: B10
cell exists: C10
cell exists: D10
cell exists: E10
cell exists: F10
[..]
cell exists: A90
cell exists: B90
cell exists: C90
cell exists: D90
cell exists: E90
cell exists: F90
cell exists: A91
cell exists: B91
cell exists: C91
cell exists: D91
cell exists: E91
cell exists: F91
cell exists: A92
cell exists: B92
cell exists: C92
cell exists: D92
cell exists: E92
cell exists: F92
cell exists: A93
cell exists: B93
cell exists: C93
cell exists: D93
cell exists: E93
cell exists: F93
cell exists: A94
cell exists: B94
cell exists: C94
cell exists: D94
cell exists: E94
cell exists: F94
cell exists: A95
cell exists: B95
cell exists: C95
cell exists: D95
cell exists: E95
cell exists: F95
cell exists: A96
cell exists: B96
cell exists: C96
cell exists: D96
cell exists: E96
cell exists: F96
cell exists: A97
cell exists: B97
cell exists: C97
cell exists: D97
cell exists: E97
cell exists: F97
cell exists: A98
cell exists: B98
cell exists: C98
cell exists: D98
cell exists: E98
cell exists: F98
cell exists: A99
cell exists: B99
cell exists: C99
cell exists: D99
cell exists: E99
cell exists: F99
cell exists: A100
cell exists: B100
cell exists: C100
cell exists: D100
cell exists: E100
cell exists: F100
aral-matrix commented 3 weeks ago

@genyez my branch now has support for the functionality as described above, coming with a bit of a performance improvement when iterating over ranges. Would you like to test this?

genyez commented 3 weeks ago

@genyez my branch now has support for the functionality as described above, coming with a bit of a performance improvement when iterating over ranges. Would you like to test this?

Brilliant work! I'll schedule it when I finished my at-hand-work.

genyez commented 3 weeks ago

I understand - and you are correct! Simply iterating over a range of cells like so

   XLCellRange testRange = wks.range("A1:F100");
    for( auto it : testRange );

is enough for all those elements to be created. I agree this is undesirable & will think about a solution. May I ask what your use case is? I.e. - would it be enough for you if for a range iterator, you have an "empty()" method to test if a cell exists, and the moment you actually access the cell in any other way (directly testing the value, formula or style), it gets created like it currently happens?

My use case was like a data table in game developing. Many cells don't need to fill because it has a default value defined elsewhere.

aral-matrix commented 3 weeks ago

@genyez my branch now has support for the functionality as described above, coming with a bit of a performance improvement when iterating over ranges. Would you like to test this?

Brilliant work! I'll schedule it when I finished my at-hand-work.

Okay, when you have time to test, could I ask you to open an account at codeberg.org? My repository is private (because I am working with Kenneth and hope the changes will eventually all make it here) and on github it's impossible to share a private repository for read only access. If you give me a codeberg user name, I'll give you read-access to my OpenXLSX fork.