masaccio / numbers-parser

Python module for parsing Apple Numbers .numbers files
MIT License
201 stars 14 forks source link

Incorrect values due to offset positions #10

Closed SheetJSDev closed 2 years ago

SheetJSDev commented 2 years ago

Test file: https://github.com/SheetJS/test_files/blob/master/numbers/types_61.numbers?raw=true

The "Star Rating", "Slider" and "Stepper" fields are giving incorrect values:

$ cat-numbers -b types_61.numbers | awk 'NR==12,NR==14' 
Star Rating (3),5.4e-323,,,,,
Slider (50),4.776854203e-314,,,,,
Stepper (12),4.775785693e-314,,,,,

The correct values should be the ones in parentheses (3, 50, 12 respectively)

This is due to how the model actually determines the offsets of the respective values. Each cell storage has a bitfield that specifies which fields should appear, so the correct position must be calculated based on the bitfield

If you're comfortable with typescript, this is what we determined solely based on the pre_bnc payload. Here's a small patch that should be roughly equivalent to the TS:

--- a/src/numbers_parser/model.py
+++ b/src/numbers_parser/model.py
@@ -344,29 +344,47 @@ class NumbersModel:
         cell_type = storage_buffer[1]
         bullets = None
         cell_value = None
+        cell_value_rich = None
+        cell_value_text = None
+        cell_value_ieee = None
+        cell_value_date = None
+
+        if storage_buffer_pre_bnc is not None:
+            flags = unpack("<i", storage_buffer_pre_bnc[4:8])[0]
+            data_offset = 12 + bin(flags & 0x0D8E).count("1") * 4
+            if (flags & 0x200) > 0:
+                cell_value_rich = unpack("<i", storage_buffer_pre_bnc[data_offset:data_offset+4])[0]
+                data_offset = data_offset + 4
+            data_offset = data_offset + bin(flags & 0x3000).count("1") * 4
+            if (flags & 0x010) > 0:
+                cell_value_text = unpack("<i", storage_buffer_pre_bnc[data_offset:data_offset+4])[0]
+                data_offset = data_offset + 4
+            if (flags & 0x020) > 0:
+                cell_value_ieee = unpack("<d", storage_buffer_pre_bnc[data_offset:data_offset+8])[0]
+                data_offset = data_offset + 8
+            if (flags & 0x040) > 0:
+                cell_value_date = unpack("<d", storage_buffer_pre_bnc[data_offset:data_offset+8])[0]
+                data_offset = data_offset + 8

         if cell_type == TSTArchives.numberCellType or cell_type == 10:
             if storage_buffer_pre_bnc is None:
                 cell_value = 0.0
             else:
-                cell_value = unpack("<d", storage_buffer_pre_bnc[-12:-4])[0]
+                cell_value = cell_value_ieee
             cell_type = TSTArchives.numberCellType
         elif cell_type == TSTArchives.textCellType:
-            string_key = unpack("<i", storage_buffer[12:16])[0]
-            cell_value = self.table_string(table_id, string_key)
+            if(cell_value_text is None): cell_value_text = unpack("<i", storage_buffer[12:16])[0]
+            cell_value = self.table_string(table_id, cell_value_text)
         elif cell_type == TSTArchives.dateCellType:
             if storage_buffer_pre_bnc is None:
                 cell_value = datetime(2001, 1, 1)
             else:
-                seconds = unpack("<d", storage_buffer_pre_bnc[-12:-4])[0]
-                cell_value = datetime(2001, 1, 1) + timedelta(seconds=seconds)
+                cell_value = datetime(2001, 1, 1) + timedelta(seconds=cell_value_date)
         elif cell_type == TSTArchives.boolCellType:
-            cell_value = unpack("<d", storage_buffer[12:20])[0] > 0.0
+            cell_value = cell_value_ieee > 0.0
         elif cell_type == TSTArchives.durationCellType:
-            cell_value = unpack("<d", storage_buffer[12:20])[0]
+            cell_value = cell_value_ieee
         elif cell_type == TSTArchives.automaticCellType:
-            string_key = unpack("<i", storage_buffer[12:16])[0]
-            bullets = self.table_bullets(table_id, string_key)
+            bullets = self.table_bullets(table_id, cell_value_rich)

         return {"type": cell_type, "value": cell_value, "bullets": bullets}
masaccio commented 2 years ago

Reviewed and merged with thanks and published to PyPI