jazzband / tablib

Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
https://tablib.readthedocs.io/
MIT License
4.59k stars 590 forks source link

Databook sheet title must be below 31 characters for XLSX #303

Open mpasternak opened 7 years ago

mpasternak commented 7 years ago

Hi,

for XLSX format, databook sheet title must be below 32 characters. If it is longer, you get a warning about a corrupted file when trying to open such file in Microsoft Excel (the original MS Excel from MS Office package).

Attached is out file and a minimal non-working code example.

import tablib

parent_data = tablib.Databook()

data = tablib.Dataset()
# collection of names
names = ['Kenneth Reitz', 'Bessie Monke']

for name in names:
    # split name appropriately
    fname, lname = name.split()

    # add names to Dataset
    data.append([fname, lname])

data.title = "1" * 32
parent_data.add_sheet(data)

x = open("test.xlsx", "wb")
x.write(parent_data.xlsx)
x.close()

This is the out file that I got on Python 3.6 and tablib 0.11.5.

test.xlsx

When I change the dataset title length to below 31 characters, everything goes back to normal.

hugovk commented 4 years ago

With Python 3.7.4, Tablib 0.14.0 and openpyxl 3.0.0:

$ python3 303.py
/usr/local/lib/python3.7/site-packages/openpyxl/workbook/child.py:99: UserWarning: Title is more than 31 characters. Some applications may not be able to read the file
  warnings.warn("Title is more than 31 characters. Some applications may not be able to read the file")

Opening the file:

image

Clicking Yes:

image

Clicking View, this file opens in a text editor: /private/var/folders/kt/j77sf4_n6fnbx6pg199rbx700000gn/T/com.microsoft.Excel/Repair Result to test0.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><logFileName>Repair Result to test0.xml</logFileName><summary>Errors were detected in file ’/tmp/tablib/test.xlsx’</summary><repairedRecords summary="Following is a list of repairs:"><repairedRecord>Repaired Records: Worksheet properties from /xl/workbook.xml part (Workbook)</repairedRecord></repairedRecords></recoveryLog>

I think all these warnings are probably enough and the user should avoid long titles.

claudep commented 4 years ago

We could truncate to 31 chars while exporting and output a RuntimeWarning.

hugovk commented 4 years ago

Yeah, we could.

I wonder, are there any applications that can read the file without any problem?

mpasternak commented 4 years ago

Just to let you know I'm still here after all these years. Nothing useful to add to current discussion though. I'd vote for warnings too.

tonyrein commented 7 months ago

On 2019-10-22 hugovk asked "I wonder, are there any applications that can read the file without any problem?" It's been a while, but if this is still relevant to you, you might try LibreOffice Calc. I'm using version 7.3.7.2 on Linux, and it opens such files with no problem. Then, when I try to open them with Excel later, Excel insists they're corrupted, but is able to open them after "repair."