teemtee / tmt

Test Management Tool
MIT License
81 stars 122 forks source link

Check for stale links #1740

Open raballew opened 1 year ago

raballew commented 1 year ago

Looking at the tmt spec, I can see that linking of metadata is a core concept . It is also possible to link to external URIs such as tickets in private issue management systems, files on a storage backend or issues on GitHub. These URIs can become stale over time and it seems logical to me, that there should be a way to check for this kind of dead links. An approach could be to extend the tmt test lint, tmt plan lint and tmt story lint command to actually check the targets as well. Perhaps it makes sense to do the check even before committing changes using a pre-commit hook which could take care of that.

It is worth keeping in mind, that some of the URIs can be hidden behind some kind authentication mechanisms, hence this should be an optional feature.

idorax commented 1 year ago

Currently tmt {test|plan|story} lint doesn't check the value of a key. @psss, if we fix this issue, we should valid the value of link or keys supported by link if it starts at http or https, right?

psss commented 1 year ago

Yes, we could check all links which start with http or https and possibly also check url key if fmf identifier is used in the link. I agree that this should be an optional feature because some links can point to internal resources which might not be always available.

thrix commented 1 year ago

pre-commit hook definitely sounds to me like a neat idea!

We should create one, so users can easily sanity check their repos with tmt metadata.

idorax commented 1 year ago

Currently Core.lint_keys() just checks the keys, it doesn't check the value of link.

If we'd like to check the key link, we can get its value first, e.g.

(Pdb) b tmt/base.py:691
Breakpoint 1 at /home/huanli/.virtualenvs/tmt/lib/python3.10/site-packages/tmt/base.py:691
(Pdb) c
/plans/foo
> /home/huanli/.virtualenvs/tmt/lib/python3.10/site-packages/tmt/base.py(691)lint_keys()
-> known_keys = additional_keys + self._keys()
(Pdb) l
686     
687             return data
688     
689         def lint_keys(self, additional_keys: List[str]) -> List[str]:
690             """ Return list of invalid keys used, empty when all good """
691 B->         known_keys = additional_keys + self._keys()
692             return [key for key in self.node.get().keys() if key not in known_keys]
693     
694         def _lint_summary(self) -> bool:
695             """ Lint summary attribute """
696             # Summary is advised with a resonable length
(Pdb) pp self.node.get().keys()
dict_keys(['discover', 'execute', 'link', 'summary'])
(Pdb) pp self.node.get('link')
'https://github.com/teemtee/tmt/issues/461'
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Hi @raballew, what do you want to check the value (e.g. https://github.com/teemtee/tmt/issues/461) exactly? a) Try to access the URL? b) Validate the URL like How to check whether a string is a valid HTTP URL?

raballew commented 1 year ago

@idorax I have not thought about b) but this might be actually the first step before addressing a). Keep in mind that links are not limited to URLs but more generically to URIs. So a link could reference a file on S3 or an FTP server. My initial thought was to actually trying to access the URI to verify that the content did not become stale.

idorax commented 1 year ago

@raballew, would you please provide some URL examples including both good URLs and stale URLs? Then I'll have a try to access the URLs via code, thanks!

raballew commented 1 year ago

@idorax I am not sure what you mean by this. A "bad" URL is anything that is reachable via HTTP/HTTPS and does not return an status code (4xx, 5xx) whereas a "good" URL return successful status codes (2xx).