Truncating long expectation filenames

xymaxim commented 9 months ago

As stated in #4, the callspec IDs can be long enough to exceed the PC_NAME_MAX value (and cause an exception).

Before creating a pull request, I would like to discuss some details.

Let's say we have a test function with two parametrized arguments and the corresponding node names:

test_filename_truncation[very-very-long-short]
test_filename_truncation[short-shirt]
test_filename_truncation[very-very-long-shirt]

1. Which parts should be truncated?

To have identical filenames with and without using the --pm-use-system-prefix option on different platforms, the filename truncation should be limited to a node name. To give an idea, here is an example of shortening the node names to 37 characters:

test_filename_truncation[very-very-lo-System.out
test_filename_truncation[short-shirt]-System.out

2. Setting the maximum length

While we can set a default value to, say, 100 characters, it would be useful to make this value customizable via the INI file by adding the corresponding field, e.g. pm-filename-max-node-length.

3. Resolving duplicates

After truncation, non-unique filenames may appear. I see two options here.

(a) Use an incremental numbering.

By adding a number just after a callspec ID:

test_filename_truncation[very-very-lo0.out
test_filename_truncation[short-shirt].out
test_filename_truncation[very-very-lo1.out

As for pytest, a similar approach is used to resolve non-unique test node names:
test_filename_truncation[very-very-long-short0]
...
test_filename_truncation[very-very-long-short1]

By adding a number suffix to the end of filename:

test_filename_truncation[very-very-lo.out.0
test_filename_truncation[short-shirt].out
test_filename_truncation[very-very-lo.out.1

(b) Use indices of the parametrized arguments:

This will allow identifying used argument values more clearly:

test_filename_truncation[very-very-lo00.out
test_filename_truncation[short-shirt].out
test_filename_truncation[very-very-lo22.out

For stacked parametrize decorators, the values will not only be mirrored, e.g. '10', '01', ...

zaufi commented 9 months ago

As for me, losing the uniqueness of filenames is a big pain in the ass. First of all, cuz an end user has to understand the implementation details of pytest-matcher to understand the rules of shortening those filenames. The very restrictive side effect of this is that pytest-matcher has to pay extra attention to the backward compatibility if we want to change 'em somehow. Secondly, it means pytest-matcher has to have very clean docs about everything %)

So, personally, I'd prefer to offload this pain to the user's ass ;) If he so dares to have such a corner test cases %)

There are few random thughts came to my mind...

Using a hash of a longest part of a filename (callspec-id). I.e., having some option (e.g., smth like --pm-parameterized-test-use-hash) expected_out instance would use one of the hash algorithms to produce <test-function-name>[hash(callspec-id)]. Possibly it can be controlled per test by setting some property on the fixture. E.g., smth like this:

@pytest.mark.parametrize(...)
def test_parametrized(..., capfd, expected_out):
    expected_out.use_hash_to_match_callspec = True
    ...

Still possible that the craziest users may have other parts used to make a full path also very-very long (e.g., a class name or a method/function name)...

Give a user total control over path/filename of expectation files. E.g., add an option to the INI file (e.g., pm-expectation-filename-template) that will be processed by Jinja2:

[tool.pytest.ini_options]
pm-expectation-filename-template = "{{base_dir}}/{{class_name|replace('_tester', '')}}/{{function_name|replace('_test', '')}}[{{callspec_id|md5}}]{{system_sfx}}.pattern"

And the terminal state of this idea is to give him the ability to specify templates per test case...

xymaxim commented 9 months ago

Yeah, it could be a total mess. This can be a bit frustrating, given that it’s not the plugin’s key feature.

Using a hash of a longest part of a filename

Thought about using hashing at first, but in a slight different context. What if someone just swapped parametrized argvalues:[(a, b), (c, d)] -> [(c, d), (a, b)], where a and c are of large length and will be truncated? The incremental numbering and indices can’t handle this case... OK, I’ll stop here. The user should also be in charge.

Let me summarize currently available workarounds for handling very long argvalues:

Use the ids argument of the parametrize decorator to set custom values for a given test function
Define the pytest_make_parametrize_id hook in conftest.py to control the IDs globally

zaufi commented 9 months ago

Oh, nice! I didn't know that @parametrize can accept ids! As for me, pytest-matcher shouldn't do anything then. A user has full control over IDs, hence the expectation filenames.

zaufi commented 7 months ago

@xymaxim

Check this out! :-)

xymaxim commented 7 months ago

@xymaxim

Check this out! :-)

Wow, great! Now is the time. So many new useful options.

zaufi / pytest-matcher

Truncating long expectation filenames #12