Closed safinaskar closed 1 year ago
There shouldn't be a problem with that many rows; Numbers supports documents up to 1M rows.
I just tried an empty document with >180k rows adding some text at the end and cat-numbers
did what I expected in dumping all rows. Something you can try to see if the document is corrupted is using SheetJS. It stores data in-browser so you're not transferring the document to their server.
Failing that if you trust me with the document, I am happy to share a dropbox URL with you.
This also works as expected:
from numbers_parser import Document
doc = Document()
sheets = doc.sheets
tables = sheets[0].tables
table = tables[0]
x = 0
for row in range(0, 100000):
for col in range(0, 4):
table.write(row, col, x)
x += 1
doc.save("large.numbers")
% cat-numbers large.numbers | wc -l
100000
One of my files has size 99 Mb. I tried to upload it to https://oss.sheetjs.com/ and to https://sheetjs.com/sql/ , but both sites hanged.
Thanks anyway. I will try to parse files other ways
There are a couple of suggestions here: https://askubuntu.com/questions/408306/is-there-a-way-to-read-osx-numbers-files
LibreOffice doesn't advertise .numbers
support, but random Internet person claims it works. Creating a free iCloud account, even if it's just a trial is a good idea as you'll be able to upload and then download as Excel. It's also a sure-fire way to see if the file is corrupted. If iCloud can't load it, then it's borked.
@masaccio , I wrote Rust program, and was able to fully restore data from that .numbers
file using that program. So, yes, the file is correct.
I still don't want to share .numbers file itself, but if you want, I can share that Rust program.
Also, if you want, I can create iCloud account and try to create similar .numbers file with fake data using iCloud and reproduce the bug and share the file.
Some hints about file contents: it is table with more than 65536 rows and 17 columns. Full of different data. With many different strings and many different numbers
One can think of one possible source of bug, but I'm not sure. It is overflow in ListEntry.key ( https://github.com/psobot/keynote-parser/blob/7114e3b6594a68d6c6885f469c7b4b3bdc27eb86/protos/TSTArchives.proto#L227 ). In my files this key
can overflow 65536, and thus this key embedded in TileRowInfo.cell_storage_buffer = 6
( https://github.com/psobot/keynote-parser/blob/7114e3b6594a68d6c6885f469c7b4b3bdc27eb86/protos/TSTArchives.proto#L128 ) can occupy more that 2 bytes
That's an easy experiment to try. My example above used numbers rather than string keys. Thanks for the pointers
Yup when I use strings in that example above, I get 2^10 strings dumped and then nothing. File creation actually works, so it's 'just' in reading. Will fix and if you don't mind testing the fix, that would be great.
Yes, I will test
@safinaskar should be working in 3.10.1. Numbers breaks the row storage maps into chunks of 64k entries which I was not supporting in read.
Yes, now it works
numbers-parser
seem not to work if an input.numbers
document has more than 65536 rows.Someone gave to me
.numbers
document. I have PC with Linux installed, so I have no Numbers. I installednumbers-parser
and converted the document to CSV usingcat-numbers
. But resulting CSV documents has 65536 normal rows and then I see,,,,,,,,,,,,,,,,
(orNone,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None
in--formatting
mode). And I think that original document has additional data.Unfortunately, I don't want to provide original document via public Github issue, because it contains confidential data.
Also, I'm not sure this is
numbers-parser
's problem, it is possible that they created broken document in the first place