jmcnamara / XlsxWriter

A Python module for creating Excel XLSX files.
https://xlsxwriter.readthedocs.io
BSD 2-Clause "Simplified" License
3.61k stars 629 forks source link

Help needed: Sensitivity labels sample files #1057

Closed jmcnamara closed 5 months ago

jmcnamara commented 5 months ago

I've had some requests to add support for Sensitivity Labels in the files produced by XlsxWriter.

0c7834dd-0ad0-4e24-a296-ca7de7e97b11

As far as I can see it should be possible with the current APIs and Custom Document Properties.

However, I need some sample files to debug/test it and I cannot generate them because Sensitivity Labels are a feature of Enterprise Excel 365. I am looking for someone to help by generating a few sample files. Ideally:

  1. A blank new Excel file.
  2. Same file saved with a non-encrypted sensitivity label like "Public".
  3. Same file save with a different non-encrypted label.

You could either attach them here or send them to jmcnamara@cpan.org

Thanks.

jmcnamara commented 5 months ago

Alternatively, could someone try the instructions below and let me know how you get on.

jmcnamara commented 5 months ago

Sensitivity Labels are a property that can be added to an Office 365 document to indicate that it is compliant with a companies information protection policies. Sensitivity Labels have designations like "Confidential", "Internal use only", or "Public" depending on the policies implemented by the company. They are generally only enabled for enterprise versions of Office.

See the following Microsoft documentation on how to Apply sensitivity labels to your files and email.

Sensitivity Labels are generally stored as custom document properties so they can be enabled using the XlsxWriter set_custom_property()Workbook property. However, since the metadata differs from company to company you will need to extract some of the requried metadata from sample files.

The first step is to create a new file in Excel and set a non-encrypted sensitivity label. Then unzip the file by changing the extension from .xlsx to .zip or by using a command line utility like this:

$ unzip myfile.xlsx -d myfile
Archive:  myfile.xlsx
  inflating: myfile/[Content_Types].xml
  inflating: myfile/docProps/app.xml
  inflating: myfile/docProps/custom.xml
  inflating: myfile/docProps/core.xml
  inflating: myfile/_rels/.rels
  inflating: myfile/xl/workbook.xml
  inflating: myfile/xl/worksheets/sheet1.xml
  inflating: myfile/xl/styles.xml
  inflating: myfile/xl/theme/theme1.xml
  inflating: myfile/xl/_rels/workbook.xml.rels

Then examine the docProps/custom.xml file from the unzipped xlsx file. The file doesn't contain newlines so it is best to view it in an editor that can handle XML or use a commandline utility like libxml’s xmllint to format the XML for clarity:

$ xmllint --format myfile/docProps/custom.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties
    xmlns="http://schemas.openxmlformats.org/officeDocument/2006/custom-properties"
    xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
  <property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
            pid="2"
            name="MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_Enabled">
    <vt:lpwstr>true</vt:lpwstr>
  </property>
  <property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
            pid="3"
            name="MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_SetDate">
    <vt:lpwstr>2024-01-01T12:00:00Z</vt:lpwstr>
  </property>
  <property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
            pid="4"
            name="MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_Method">
    <vt:lpwstr>Privileged</vt:lpwstr>
  </property>
  <property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
            pid="5"
            name="MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_Name">
    <vt:lpwstr>Confidential</vt:lpwstr>
  </property>
  <property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
            pid="6"
            name="MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_SiteId">
    <vt:lpwstr>cb46c030-1825-4e81-a295-151c039dbf02</vt:lpwstr>
  </property>
  <property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
            pid="7"
            name="MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_ActionId">
    <vt:lpwstr>88124cf5-1340-457d-90e1-0000a9427c99</vt:lpwstr>
  </property>
  <property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}"
            pid="8"
            name="MSIP_Label_2096f6a2-d2f7-48be-b329-b73aaa526e5d_ContentBits">
    <vt:lpwstr>2</vt:lpwstr>
  </property>
</Properties>
/tmp$

The MSIP (Microsoft Information Protection) labels in the name attributes contain a GUID that is unique to each company. The "SiteId" field will also be unique to your company/location. The meaning of each of these fields is explained in the the following Microsoft document on Microsoft Information Protection SDK - Metadata.

Once you have identified the necessary metadata you can add it to a new document as shown below.

import xlsxwriter

workbook = xlsxwriter.Workbook("sensitivity_label.xlsx")
worksheet = workbook.add_worksheet()

# Metadata extracted from a company specific file.
company_guid = "2096f6a2-d2f7-48be-b329-b73aaa526e5d"
site_id = "cb46c030-1825-4e81-a295-151c039dbf02"
action_id = "88124cf5-1340-457d-90e1-0000a9427c99"

# Add the document properties. Note that these should all be in text format.
workbook.set_custom_property(f"MSIP_Label_{company_guid}_Enabled", "true", "text")
workbook.set_custom_property(f"MSIP_Label_{company_guid}_SetDate", "2024-01-01T12:00:00Z", "text")
workbook.set_custom_property(f"MSIP_Label_{company_guid}_Method", "Privileged", "text")
workbook.set_custom_property(f"MSIP_Label_{company_guid}_Name", "Confidential", "text")
workbook.set_custom_property(f"MSIP_Label_{company_guid}_SiteId", site_id, "text")
workbook.set_custom_property(f"MSIP_Label_{company_guid}_ActionId", action_id, "text")
workbook.set_custom_property(f"MSIP_Label_{company_guid}_ContentBits", "2", "text")

workbook.close()

Note, some sensitivity labels require that the document is encrypted. In order to extract the required metadata you will need to unencrypt the file which may remove the sensitivity label. In that case you may need to use a third party tool such as msoffice-crypt.

jmcnamara commented 5 months ago

I have added an example and explanation to the docs: https://xlsxwriter.readthedocs.io/example_sensitivity_label.html

If anyone has any issues please let me know.