py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.12k stars 1.39k forks source link

MAINT: Amend byte cache #2730

Closed j-t-1 closed 2 months ago

j-t-1 commented 3 months ago

Cache string to byte conversions when the string length is greater than one.

2726

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.14%. Comparing base (c96d01c) to head (e3136a6).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2730 +/- ## ======================================= Coverage 95.14% 95.14% ======================================= Files 51 51 Lines 8547 8547 Branches 1703 1703 ======================================= Hits 8132 8132 Misses 261 261 Partials 154 154 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

stefan6419846 commented 3 months ago

I am not sure if this really makes sense or what the original intent of the old implementation has been. Until this is more clear (see my comment in #2726), I recommend against merging this.

j-t-1 commented 2 months ago
@pytest.mark.parametrize(
    ("input_str", "expected"),
    [
        ("foo", b"foo"),
        ("πŸ˜€", "πŸ˜€".encode()),
        ("‰", "‰".encode()),
        ("β–·", "β–·".encode()),
        ("δΈ–", "δΈ–".encode()),
        # A multi-character string example with non-latin-1 characters:
        ("πŸ˜€πŸ˜ƒ", "πŸ˜€πŸ˜ƒ".encode()),
    ],
)
def test_b(input_str: str, expected: str):
    assert pypdf._utils.b_(input_str) == expected

Should this be:

def test_b(input_str: str, expected: bytes):
    assert pypdf._utils.b_(input_str) == expected
stefan6419846 commented 2 months ago

Isn't your new code the same? If you are referring to the decorator - no, this cannot be removed. This defines the parameters/sub-tests to use for the test.

j-t-1 commented 2 months ago

My comment could have been better. The change is to the type of the second argument from str to bytes (keeping the decorator).

stefan6419846 commented 2 months ago

Yes, the expected parameter should be bytes, as we are encoding the strings here.

j-t-1 commented 2 months ago

The caching of the existing function is maybe deliberate.