thombashi / pytablewriter

pytablewriter is a Python library to write a table in various formats: AsciiDoc / CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
https://pytablewriter.rtfd.io/
MIT License
610 stars 43 forks source link

xslx formula injection #20

Closed randomstuff closed 4 years ago

randomstuff commented 4 years ago

There is a formula injection bug in pytablewriter:

from pytablewriter import ExcelXlsxTableWriter, JsonTableWriter, HtmlTableWriter, ExcelXlsTableWriter

VALUES = [
    # These ones are OK:
    (HtmlTableWriter, "html"),
    (JsonTableWriter, "json"),
    (ExcelXlsTableWriter, "xls"),
    # Injection on this class:
    (ExcelXlsxTableWriter, "xlsx"),
]

for (Writer, extension) in VALUES:
    writer = Writer()
    writer.table_name = "test"
    writer.headers = ["x"]
    writer.value_matrix = [["=cmd|' /C notepad'!'A1'"]]
    writer.dump("test." + extension)

Note how the content of test.xslx is different of the content of the other generated files (including test.xsl): the input is expanded into a formula instead of being treated a plain text.

This is a security issue: it can be abused to trigger shell command injection or data exfiltration from Excel/LibreOffice. See [1,2,3] for examples of data exfiltration and shell command execution. The examples are in the context of CSV injection but the same payloads can be used here.

[1] https://www.notsosecure.com/data-exfiltration-formula-injection/ [2] http://georgemauer.net/2017/10/07/csv-injection.html [3] https://youtu.be/C1o5uVOaufU?t=364

thombashi commented 4 years ago

Thank you for your report.

I'm not sure whether this is a bug of pytablewriter. I have an impression that this is a vulnerability of Excel/LibreOffice, looks like some measures are taken by these applications:

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-3524 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-7262

However, sanitizing a formula value feature may nice to have. I will consider the feature for the future release.

thombashi commented 4 years ago

I had added an interface to escape formula injection at pytablewriter 0.47.0

randomstuff commented 4 years ago

Sorry for the late response!

When using the new version with escape enabled, a leading "'" is added to string beginning with a "=" (roughly). The output data is not equals to the input data in this case:

excel-escape

I was thinking more of a solution where an actual data cell (instead of a formula cell) in emitted in the .xslx file. It's actually possible to emit data cells like this with xlsxwriter like this:

worksheet = self.workbook.add_worksheet(worksheet_name)
worksheet..strings_to_formulas = False

This emits actual text content for the cell:

<f>=cmd|' /C notepad'!'A1'</f>

instead of emitting formula content:

<t>=cmd|' /C notepad'!'A1'</t>

Updated test:

from pytablewriter import ExcelXlsxTableWriter, JsonTableWriter, HtmlTableWriter, ExcelXlsTableWriter

VALUES = [
    # These ones are OK:
    (HtmlTableWriter, "html"),
    (JsonTableWriter, "json"),
    (ExcelXlsTableWriter, "xls"),
    # Injection on this class:
    (ExcelXlsxTableWriter, "xlsx"),
]

for (Writer, extension) in VALUES:
    writer = Writer()
    writer.escape_formula_injection = True
    writer.table_name = "test"
    writer.headers = ["x"]
    writer.value_matrix = [["=cmd|' /C notepad'!'A1'"]]
    writer.dump("test." + extension)
thombashi commented 4 years ago

Thank you for your response.

strings_to_formulas of xlsxwriter feature looks great. Using the feature for .xlsx files would be good. However, I thought that escaping still required for .csv and .xls files. What do you think about it?

randomstuff commented 4 years ago

For CSV files, it's not clear to me what should be done about it. In this case, it's really more a problem of a consumer that a problem of a producer in my opinion. There is no notion of formula in CSV. All we can so is to use hackish workarounds which could break some input data. I'd think that using .xslx instead of CSV as output might be a way to avoid the problem of injection when targeting spreadsheets programs which might be vulnerable to this sort of things.

For .xls files, I don't know if it is possible to do something similar to what I'm proposing.