masaccio / numbers-parser

Python module for parsing Apple Numbers .numbers files
MIT License
201 stars 14 forks source link

Detect and skip formula serialization in cells with a formula error #42

Closed SheetJSDev closed 1 year ago

SheetJSDev commented 2 years ago

Just remove the extension formula-error.numbers.zip

% cat-numbers -b --formulas formula-error.numbers
sum,prod
1.0,2.0
1.0,4.0
...
IndexError: pop from empty list

During handling of the above exception, another exception occurred:

...
numbers_parser.exceptions.UnsupportedError: Unsupported formula buffer (3,1)

This error seems to come from cat-numbers --formulas trying to generate formula strings from cells with an error.

The offending cells have type 8 (TSTArchives.formulaErrorCellType). Instead of erroring, it should either leave the cells blank or write some error like #REF! or #NAME?

masaccio commented 2 years ago

v3.5.2 prints #REF! for reference errors and in non-formula mode for error cells. Here's your test file without --formulas:

% cat-numbers -b tests/data/issue-42.numbers 
sum,prod
1.0,2.0
1.0,4.0
2.0,#REF!
3.0,#REF!
,#REF!
7.0,#REF!

I contemplated using different notation for text mode, but Excel renders these cells as #REF! so I kept that. Whether I can differentiate between different error types (#VALUE!, #DIV/0!, etc.) isn't something I've looked at yet.

With --formulas the formulas are returned regardless of whether they return an error but any reference errors are included as #REF:

% cat-numbers -b --formulas tests/data/issue-42.numbers 
sum,prod
1.0,2.0
1.0,4.0
2.0,#REF!×A4:A6
3.0,#REF!×A5:A6
,#REF!×A6:A6
SUM(A),PRODUCT(B)

The last PRODUCT function uses the column reference rather than the named reference as I don't currently decode name references. The AST for that node just includes a regular relative column reference so the named references must be stored elsewhere.

There is a whole bunch of additional data in the AST for your ranges that are multiplied by REF! which I am ignoring and just treating those nodes as a regular colon node.

{
    "ASTNodeType": "COLON_NODE_WITH_UIDS",
    "ASTTractList": {
        "tract": [
            {
                "columnUids": {},
                "rowUids": {
                    "uid": [
                        {
                            "lower": "0x1BFB6AA9DCD5A630",
                            "upper": "0x555C30B94BE6405E"
                        },
                        {
                            "lower": "0x77A940771339369D",
                            "upper": "0xBE51B3374084790E"
                        }
                    ]
                },
                "isRangeRef": true,
                "purpose": "UIDS_INCLUDED",
                "preserveRectangular": true
            }
        ],
        "stickyBits": {
            "beginRowIsAbsolute": false,
            "beginColumnIsAbsolute": false,
            "endRowIsAbsolute": false,
            "endColumnIsAbsolute": false
        }
    }
}

Seems important but in your small example at least it's not relevant.

masaccio commented 1 year ago

Just checking back on this @SheetJSDev OK to close?

SheetJSDev commented 1 year ago

Yeah it looks fine against 3.9.1:

% cat-numbers --version
3.9.1
% cat-numbers -b --formulas formula-error.numbers
sum,prod
1.0,2.0
1.0,4.0
2.0,#REF!×A4:A6
3.0,#REF!×A5:A6
,#REF!×A6:A6
SUM(A),PRODUCT(B)