Closed dmosberger closed 4 years ago
Isn't this the same issue as #226 ? That was supposed to be fixed in tablib 1.1.0. I cannot access your xlsx sample file, please put it in a non-protected location (with non-sensitive data, of course).
The test.xslx is attached to my original post, so you shouldn't need a Google account to download the test case.
Oh right, I'll have a look soon.
That might be a Google Docs pecularity. If I save the same file with LibreOffice as .xlsx, I'm getting two values ((<ReadOnlyCell 'Sheet1'.A2>, <EmptyCell>)
) for the problematic row.
We may apply the same fix as for the csv
format to circumvent this use case.
Yes, I agree it's very likely something changed in Google Docs.
I tried commit 71ff737 and it fixes the test case, but I think there is still a problem if the first row has fewer columns than a subsequent row. See the attached test case.
python3 -c "import tablib; tablib.Databook().load(open('test2.xlsx', 'rb').read(), 'xlsx')"
still gives me an InvalidDimensions error.
Sure, however your latest example looks clearly wrong to me. tablib expects generally at least a full row of headers. So I don't think we can support all use cases.
Hmmh, I don't know. I quite often have a fixed format table and then use a cell to the right of a row to make random comments. For example:
Col A Col B
val1 val2
val3 val4 This_is_comment_about_row_3
It'd be sad if tablib can't load such a file.
@hugovk, @claudep could you add the solution to the import_set as well, you just added it to import_book.
tablib used to be able to read xlsx files exported from Google Docs just fine. Now it fails with InvalidDimensions if not all columns in the spreadsheet are populated with values. For example, download this spreadsheet as xlsx (also attached below) and then try to read it like this:
python3 -c "import tablib; tablib.Databook().load(open('test.xlsx', 'rb').read(), 'xlsx')"
and it'll result in an InvalidDimensions exception. The problem seems to be due to the fact that the first row has values in 2 columns, but the second row has an empty cell in the second column. I'm guessing something changed in the way Google exports the xlsx files? test.xlsx