jazzband / tablib

Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
https://tablib.readthedocs.io/
MIT License
4.6k stars 592 forks source link

Replace MarkupPy by ElementTree for html conversion #554

Closed claudep closed 1 year ago

codecov[bot] commented 1 year ago

Codecov Report

Merging #554 (ee7a88d) into master (d48407c) will decrease coverage by 0.06%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #554      +/-   ##
==========================================
- Coverage   91.42%   91.37%   -0.06%     
==========================================
  Files          28       28              
  Lines        2730     2713      -17     
==========================================
- Hits         2496     2479      -17     
  Misses        234      234              
Impacted Files Coverage Δ
src/tablib/formats/__init__.py 95.45% <100.00%> (-0.06%) :arrow_down:
src/tablib/formats/_html.py 100.00% <100.00%> (ø)
tests/test_tablib.py 98.72% <100.00%> (-0.03%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

mgorny commented 1 year ago

Thanks. That's definitely better idea than mine (of skipping HTML tests). I also like that you're using DOM to create elements, that's probably the most efficient way of doing it.

hugovk commented 1 year ago

Thanks!

claudep commented 1 year ago

Thanks for the merge.

While I was on this format, I explored a bit about adding html import support, also without external dependency: #555

matthewhegarty commented 1 year ago

I noticed that this change breaks the CI build in django-import-export. This is for cases when we are escaping harmful characters such as script tags.

If there is already an escaped character in content, then the use of ElementTree will double escape, which will not format correctly. This test will reproduce the issue:

    def test_html_export_with_special_chars(self):
        self.founders = tablib.Dataset(headers=self.headers, title='Founders')
        self.founders.append(('J &amp; J', 'A', 90))
        self.assertIn("J &amp; J", self.founders.html)

This test fails because the '&' string is exported as '&amp;' So if someone has 'J & J' in their content (rendered with an escape), they will now get 'J &amp; J'.

claudep commented 1 year ago

@matthewhegarty Please create a new issue for this problem where we can discuss that double-escaping issue.