RFE: `tmt test id` option to check for duplicate IDs

engelmi commented 2 months ago

The tmt CLI can be used to generate UUIDs for tests via tmt test id. Unfortunately, it does not check for duplicates. This can happen, for example, when tests are being copied and the ID is not adjusted - which can happen quite easily.

Therefore, I'd like to propose to add an additional check for duplication to the tmt test id command. Possible options I can think of are:

checking implicitly, generating new UUIDs for identified duplicates
provide another option to check and overwrite identified duplicates, e.g. --overwrite-duplicates
provide another option just to check for duplicates, e.g. --check-duplicates

For option 1. and 2. the --dry option would only print out the identified tests with duplicates and not change the file. But maybe there are other/better option? If this RFE is accepted and refined, I can also start working on it and provide a PR for it.

On a side note, it seems that tmt test id is missing in the documentation: https://tmt.readthedocs.io/en/latest/stories/cli.html#stories-cli-test

happz commented 2 months ago

Sounds also like a good idea for yet another tmt * lint check, one that looks not just at a single object at a time, but also keeps some notion of the global state, i.e. the existing IDs that were previously detected in the set of objects already examined.

engelmi commented 2 months ago

I agree, I think it would fit good as another lint check. And yes, if I understand it correctly (but only had a brief look at it), I'd need to introduce state into the tmt test id command (keeping track of the IDs). Is this acceptable or should this just be a lint check? It would be convenient to auto-fix duplicate IDs, though. Is there a tmt * lint --auto-fix or something similar? @happz

happz commented 2 months ago

I agree, I think it would fit good as another lint check. And yes, if I understand it correctly (but only had a brief look at it), I'd need to introduce state into the tmt test id command (keeping track of the IDs). Is this acceptable or should this just be a lint check? It would be convenient to auto-fix duplicate IDs, though. Is there a tmt * lint --auto-fix or something similar? @happz

To me, this sounds more like an opportunity for tmt lint to be more useful. As you wrote at the beginning, this can happen when tests are copied and IDs are not modified - this is beyond what tmt test id can control because it has not been called to manage said IDs. I'm afraid we cannot prevent users from doing this kind of stuff, and tmt test id itself uses some magic to create unique IDs whenever it gets called, so the best option I see is to at least let the user know via linter there's a problem they brought on their heads :) Therefore I see no need for tmt test id to be concerned with anything but the given test.

It would be convenient to auto-fix duplicate IDs, though. Is there a tmt * lint --auto-fix or something similar? @happz

There is a --fix option, e.g. https://github.com/teemtee/tmt/blob/main/tmt/base.py#L1481 uses it, so we can assign a new ID to test if asked to do so. But while technically possible, I'm not sure if tmt should do it: let's say there are two tests with the same ID, which one of them is wrong? tmt creates IDs with the help of the uuid4() function, there is nothing related to the test reflected in the ID, so which one of those two tests should we fix? :)

engelmi commented 2 months ago

There is a --fix option, e.g. https://github.com/teemtee/tmt/blob/main/tmt/base.py#L1481 uses it, so we can assign a new ID to test if asked to do so. But while technically possible, I'm not sure if tmt should do it: let's say there are two tests with the same ID, which one of them is wrong? tmt creates IDs with the help of the uuid4() function, there is nothing related to the test reflected in the ID, so which one of those two tests should we fix? :)

Initially, I assumed that it doesn't matter which test gets the new UUID assigned. Thinking about it, however, this is not true.

So the best solution is to extend tmt lint and list the tests containing any duplicate UUIDs found. Thanks for your help! @happz I'd be up for implementing this issue :) Could you assign it to me?

happz commented 2 months ago

We don't have a linter level that would focus on the repository, to consider objects as they are in it. We have tmt test lint which basically means "all tests", but can be executed for a subset too, tmt test lint /tests/foo/. So you will need some pre-lint step in code where a set of all IDs would be collected, then the check would be able to check this container and find out whether any of its IDs have more than one owner listed.

Or you can build the storage as you go, one by one, but then it would report no conflict if called for just one of the tests from that conflicting pair. It might be acceptable, I for one would not mind, because, eventually, I run tmt lint which sees everything in pre-commit, so I would learn about conflicts before committing.

teemtee / tmt

RFE: `tmt test id` option to check for duplicate IDs #2939