Closed randomstuff closed 4 years ago
Thank you for your report.
I'm not sure whether this is a bug of pytablewriter
.
I have an impression that this is a vulnerability of Excel/LibreOffice, looks like some measures are taken by these applications:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-3524 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-7262
However, sanitizing a formula value feature may nice to have. I will consider the feature for the future release.
I had added an interface to escape formula injection at pytablewriter 0.47.0
Sorry for the late response!
When using the new version with escape enabled, a leading "'" is added to string beginning with a "=" (roughly). The output data is not equals to the input data in this case:
I was thinking more of a solution where an actual data cell (instead of a formula cell) in emitted in the .xslx file. It's actually possible to emit data cells like this with xlsxwriter like this:
worksheet = self.workbook.add_worksheet(worksheet_name)
worksheet..strings_to_formulas = False
This emits actual text content for the cell:
<f>=cmd|' /C notepad'!'A1'</f>
instead of emitting formula content:
<t>=cmd|' /C notepad'!'A1'</t>
Updated test:
from pytablewriter import ExcelXlsxTableWriter, JsonTableWriter, HtmlTableWriter, ExcelXlsTableWriter
VALUES = [
# These ones are OK:
(HtmlTableWriter, "html"),
(JsonTableWriter, "json"),
(ExcelXlsTableWriter, "xls"),
# Injection on this class:
(ExcelXlsxTableWriter, "xlsx"),
]
for (Writer, extension) in VALUES:
writer = Writer()
writer.escape_formula_injection = True
writer.table_name = "test"
writer.headers = ["x"]
writer.value_matrix = [["=cmd|' /C notepad'!'A1'"]]
writer.dump("test." + extension)
Thank you for your response.
strings_to_formulas
of xlsxwriter
feature looks great.
Using the feature for .xlsx
files would be good.
However, I thought that escaping still required for .csv
and .xls
files.
What do you think about it?
For CSV files, it's not clear to me what should be done about it. In this case, it's really more a problem of a consumer that a problem of a producer in my opinion. There is no notion of formula in CSV. All we can so is to use hackish workarounds which could break some input data. I'd think that using .xslx instead of CSV as output might be a way to avoid the problem of injection when targeting spreadsheets programs which might be vulnerable to this sort of things.
For .xls files, I don't know if it is possible to do something similar to what I'm proposing.
There is a formula injection bug in pytablewriter:
Note how the content of
test.xslx
is different of the content of the other generated files (includingtest.xsl
): the input is expanded into a formula instead of being treated a plain text.This is a security issue: it can be abused to trigger shell command injection or data exfiltration from Excel/LibreOffice. See [1,2,3] for examples of data exfiltration and shell command execution. The examples are in the context of CSV injection but the same payloads can be used here.
[1] https://www.notsosecure.com/data-exfiltration-formula-injection/ [2] http://georgemauer.net/2017/10/07/csv-injection.html [3] https://youtu.be/C1o5uVOaufU?t=364