pbs / pycaption

Python module to read/write popular video caption formats
Apache License 2.0
256 stars 136 forks source link

Tests fail using BeautifulSoup4 v4.12.2 #304

Closed Vekhir closed 1 year ago

Vekhir commented 1 year ago

Hi, \ I recently upgraded bs4 from 4.11.2 to 4.12.2 and now pycaption fails during testing. \ I'm not exactly sure where the issue is, so I'm starting here and maybe you can reproduce it or give some insight into the kind of error.

Can you verify the issue?

Additional information

Going through the bs4 changelog, I didn't notice any change related to this issue. \ I have also rebuilt pycaption with bs4 4.11.2 and the tests ran through just fine. My distro doesn't package 4.12.0 or 4.12.1, but if necessary, I might do some bisecting.

Edit: The issue is changed behaviour in BeautifulSoup 4.12.1 with regards to Tags.prettify, which adds newlines to some outputs. The tests don't account for these newlines, which is why they fail. The most straight-forward solution is to adapt the tests and bump the requirements (See also #305). More details below.

System information

OS: Arch Linux Kernel: 6.3.2-arch1-1 Python: 3.11.3-1 Pycaption: 2.1.0/2.1.1 (AUR package / manually updated to 2.1.1) BeautifulSoup4: 4.11.2-3/4.12.2-1 (Official package)

Appendix 1 (full error log)

==> Starting check()...
============================= test session starts ==============================
platform linux -- Python 3.11.3, pytest-7.3.1, pluggy-1.0.0
rootdir: /build/python-pycaption/src/pycaption-2.1.0
plugins: lazy-fixture-0.6.3
collected 256 items

tests/test_base.py .......                                               [  2%]
tests/test_dfxp.py .......................................               [ 17%]
tests/test_dfxp_conversion.py F

=================================== FAILURES ===================================
________________________ TestDFXPtoDFXP.test_conversion ________________________

self = <tests.test_dfxp_conversion.TestDFXPtoDFXP object at 0x7fbfff47abd0>
sample_dfxp_output = '<?xml version="1.0" encoding="utf-8"?>\n<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3....00:00:36.200" region="bottom" style="default">\n    &lt;LAUGHING &amp; WHOOPS!&gt;\n   </p>\n  </div>\n </body>\n</tt>'
sample_dfxp = '<?xml version="1.0" encoding="utf-8"?>\n<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml"\n    xmlns:tts="http://ww...00:00:36.200" region="bottom" style="default">\n    &lt;LAUGHING &amp; WHOOPS!&gt;\n   </p>\n  </div>\n </body>\n</tt>'

    def test_conversion(self, sample_dfxp_output, sample_dfxp):
        caption_set = DFXPReader().read(sample_dfxp)
        results = DFXPWriter().write(caption_set)

        assert isinstance(results, str)
>       self.assert_dfxp_equals(sample_dfxp_output, results)

tests/test_dfxp_conversion.py:25: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <tests.test_dfxp_conversion.TestDFXPtoDFXP object at 0x7fbfff47abd0>
first = '<?xml version="1.0" encoding="utf-8"?>\n<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3....00:00:36.200" region="bottom" style="default">\n    &lt;LAUGHING &amp; WHOOPS!&gt;\n   </p>\n  </div>\n </body>\n</tt>'
second = '<?xml version="1.0" encoding="utf-8"?>\n<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3....:00:36.200" region="bottom" style="default">\n    &lt;LAUGHING &amp; WHOOPS!&gt;\n   </p>\n  </div>\n </body>\n</tt>\n'
ignore_styling = False, ignore_spans = False

    def assert_dfxp_equals(self, first, second,
                           ignore_styling=False,
                           ignore_spans=False):
        first_soup = BeautifulSoup(first, 'lxml')
        second_soup = BeautifulSoup(second, 'lxml')

        if ignore_styling:
            self._remove_styling(first_soup)
            self._remove_styling(second_soup)

        if ignore_spans:
            self._remove_spans(first_soup)
            self._remove_spans(second_soup)

        self._trim_text(first_soup)
        self._trim_text(second_soup)

>       assert first_soup == second_soup
E       AssertionError

tests/mixins.py:140: AssertionError
=========================== short test summary info ============================
FAILED tests/test_dfxp_conversion.py::TestDFXPtoDFXP::test_conversion - Asser...
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
================== 1 failed, 46 passed, 18 warnings in 0.86s ===================
==> ERROR: A failure occurred in check().
Vekhir commented 1 year ago

So apparently, the issue is that starting with BeautifulSoup 4.12.1, Tag.prettify adds some newlines at the end. From the changelog:

* Tag.prettify() will now consistently end prettified markup with
  a newline.

prettify is used in the DFXPWriter.write method, so the output changes, and the tests make a strict comparison where the added newline leads to failure.

Either the tests somehow ignore the newline/check both cases, or the test fixture adds the newline and the requirements go up to 4.12.1.\ A PR for the latter case may follow shortly.

Vekhir commented 1 year ago

The relevant PR is https://github.com/pbs/pycaption/pull/305