auto-append bug references from commit message bodies into generated changes

Background

There is a requirement in openSUSE packaging standards that entries in package .changes files which relate to bugs / features in external bug / feature tracking systems should be referenced using a specific format (e.g. bsc#123456):

https://en.opensuse.org/openSUSE:Creating_a_changes_file_(RPM)#Bug_fix.2C_feature_implementation

Problem description

Connection between auto-generated `.changes` and commit message headers

Currently tar_scm auto-generates entries from the first line of each commit message. This means that changes auto-generated by tar_scm only comply with openSUSE packaging standards if the first line of a commit message relating to a bug or feature refers to that bug or feature in the same compliant format described in the above link.

Limit on recommended length of commit message header

The problem with this connection is that it forces the first line of commit messages to be 13 or so bytes longer than it would otherwise need to be e.g.

fix race condition in component foo when bar happens

is required to become

fix race condition in component foo when bar happens (bsc#123456)

and this is a problem because it is widely agreed that the first line of commit messages should be kept short, e.g. below 50 characters.

Tracking semantic relationships between changes and bugs/features

A further problem with trying to squash bug / feature references into the first line of a commit is that it is lossy. For example, the mere mention of a particular bug does not tell the reader whether the change completely fixes that bug, partially fixes it, or does not fix it at all but is somehow related to it.

Proposed solution 1

There is already an established best practice for avoiding this lossiness in commit messages, by including one reference per line in the body of the commit message, e.g.

Closes-Bug: #1234567 -- use 'Closes-Bug' if the commit is intended to fully fix and close the bug being referenced.
Partial-Bug: #1234567 -- use 'Partial-Bug' if the commit is only a partial fix and more work is needed.
Related-Bug: #1234567 -- use 'Related-Bug' if the commit is merely related to the referenced bug.

These recommended formats were taken from the OpenStack project which only uses a single bug / feature tracker (https://launchpad.net/), but obviously OBS packages need to reference multiple trackers, so the #1234567 reference could be replaced by a full URL, e.g.

Closes-Bug: https://bugzilla.suse.com/show_bug.cgi?id=1234567

This has the added advantage of making it easy to navigate directly from a commit message to the referenced bug / feature without having to first expand the shortened form.

Given that the 50 character best practice limit applies to commit message but not rpm .changes entries, a reasonable solution seems to be:

Ensure that commit messages include these whole-line references to full external URLs in their body (Closes-Bug: https://bugzilla.suse.com/show_bug.cgi?id=1234567), rather than the shortened form in the first line of the commit message (blah blah (bsc#1234567)).
Make tar_scm recognise reference lines in git commit messages utilising the above format, and when auto-generating .changes entries, automatically convert them to the shortened form which complies with the current .changes packaging policy.

Proposed solution 2

An alternative solution to the above approach is simply to change the current OBS packaging policy so that it allows the long form in .changes entries as an alternative to the shortened form. So rather than a .changes entry being required to be formatted similar to:

- fix bug in component foo (bsc#1234567)

the following form or similar would also be considered acceptable, and perhaps even recommended over the traditional short form:

- fix bug in component foo
  (Closes-Bug: https://bugzilla.suse.com/show_bug.cgi?id=1234567)

Notice that I wrote "also" rather than "instead". It would be necessary for the policy to continue to accept the old form, since

we don't want to make all existing .changes files non-compliant in one fell swoop
old habits die hard
some tooling may depend on the old, shortened form

Of course the downside of this would be a long period during which we would have two commonly used formats, and not many people enjoy increased inconsistency ...

Further questions

We already know that some people heavily rely on .changes files having correct bug references. But are there also any use cases in which it would be useful for these references to be machine-readable?

History

This issue originated from an internal discussion within SUSE's Cloud development team. SUSE employees can view it here; apologies to anyone else reading this for the lack of access.

The problem with using a full URL to the bug is that these links can, and often do, change. For example, when a new version of bugzilla is released and changes the url format from https://bugzilla.suse.com/show_bug.cgi?id=1234567 (which is a perfect example of how URLs should not be formatted; see https://www.w3.org/Provider/Style/URI), to something like https://bugzilla.suse.com/1234567 then every link in the every repo is suddenly broken. Or when the instance of buzilla is migrated to the new bug-tracker-as-a-service and the host changes from bugzilla.suse.com to suse.bugsRus.com. It is better to just use the short bug number.

If there is some ambuiguity about which bug tracking system that a bug came from, perhaps machine-parseable alternatives could be considered to disambiguate them, .like Closes-bug: bsc#1234567 (bugzilla) or Closes-bug: JIRA:bsc#1234567

I should have added, though, that both of these proposals are a much-needed and welcome improvement over the current convention, even if they were to use the full URL.

@GarySmith Thanks a lot for the comments, and for your support of these proposals. That's a good point about URLs breaking, but it seems to me like a slightly fallacious jump in logic from that to the statement:

It is better to just use the short bug number.

Firstly, it is pretty unlikely that any update to an instance of bugzilla or any other system would break old URLs. Any backwards-incompatible changes would almost certainly include some extra redirect rules to prevent this from happening (and if they didn't, this could easily be fixed either per-instance or by submissions to the upstream project). The second scenario you point out - where the instance moves to a different domain - is much more likely, although in many cases one would hope that the old domain would still redirect to the new one. In fact, the https://www.w3.org/Provider/Style/URI page you linked to makes a very strong argument that URLs should never be allowed to change. The argument that they should follow a certain implementation-independent schema is just a sensible corollary of that.

Secondly, even if the URLs did break, what is the exact impact of that? A broken URL is by definition one which cannot be navigated to immediately, but neither can a shortened bug reference like bsc#1234567. The latter always requires some mechanism for expansion to a working URL - ideally automatic, but of course it could be done manually instead. And in any context where a short-form can be expanded, there would almost certainly also be mechanisms for rewriting URLs. So in practice, the shortened form requires more work than the long form:

	shortened form	full URL
before any breakage	requires expansion	immediately clickable
after URLs break (fairly unlikely)	requires a different expansion	requires rewriting URL

Further arguments against the short form

The shortened form is by design user-hostile, since it forces users to have knowledge of how to expand it before they can retrieve any useful information from the reference. It also requires this mapping to be maintained by the openSUSE project, even though the project does not maintain most of the issue tracker tools which it refers to.

In contrast URLs are a universally agreed and well-understood hyperlinking standard supported by countless pieces of software. That means that they have all kinds of useful properties which the short form doesn't, e.g. a namespace which is governed in a universally agreed manner designed minimise conflicts, and delegate responsibility of governance in a scalable manner.

Benefits of the short form

Since I'm arguing against the short form, it's only fair to also ask what advantages the short form brings. The only thing I can think of is shorter change logs, but I don't really see that as a huge advantage. If full URLs bloating changelogs was a big problem in a certain context, it would be easy to apply the inverse mapping to condense full URLs to the short form for ease of reading. I'd be interested to hear from people why they felt this bloat would be a genuine issue, or if there are other advantages I missed.

In my experience, the supposition that it is "fairly unlikely" that the URLs will break is wishful thinking, with the actual value being somewhere between "likely" and "certain". The most common reasons I have seen are moving the bug software to a new host, and migrating to new bug tracking software (with appropriate data migration to be able to search for old bug numbers). Having a git repo full of commits with broken URLs is less desirable than having a simple bug number that I can copy/paste into the bug tracking software and know that I can find the bug.

But if in this ecosystem (which I am admittedly new to), there will be a priority placed on guaranteeing that old bug URLs still work for a long period of time, even when moved across hosts or technologies, then using a full URL is fine.

In my experience, the supposition that it is "fairly unlikely" that the URLs will break is wishful thinking, with the actual value being somewhere between "likely" and "certain". The most common reasons I have seen are moving the bug software to a new host, and migrating to new bug tracking software (with appropriate data migration to be able to search for old bug numbers).

I think we'll have to agree to disagree on that. At risk of repeating myself, it's trivial to set up CNAME entries and webserver redirect rules to ensure that the old URLs redirect to the new place, and again, finding an updated URL is no easier whether you're starting with a broken URL or an abbreviated form whose expansion has recently changed. Both are equally confusing and require googling. But only the broken URLs can be automatically fixed by introducing redirects. And frankly, any such migration project stupid enough to not put automatic redirects in place deserves the misery it will get ;-)

I completely agree with GarySmith that things in the universe tend to change (just think about the problem free OpenStack upgrades that do never require any change in running instances... ;-).

It's a fact that especially openSUSE's bugtracker is available under different URLs already. At the moment, I can use https://bugzilla.{novell,suse}.com or https://bugzilla.opensuse.org/ to get to "my" bug report. Just the layout changes. There were already discussions in the past the split out at least the openSUSE part to an own system - avoiding that people need to register themselves at Novell if they just want to report a bug against openSUSE...

And you might open a can of (security) worms here, as parsing URLs is not always easy and an attacker might use a URL in such a form to redirect tools and humans to his ugly stuff (and: people tend to click on URLs much quicker than on shortened forms). If the tools have the URL they refer to in their code, someone might still think about doing crazy things with the shortened form, but it's up to the parser (or human) to follow this stuff or not.

I also don't think that the people who wrote all the parsers will be happy with your approach. Just some examples: https://download.suse.com/ => enter a bug number in the "Keywords" field to see if a bug is referenced in a maintenance update; https://build.opensuse.org/search allows explicitly to search for bugs mentioned in changes files. So you need to convince the developers of (all) those systems to rewrite their .changes parsers...

Another question: what happens with line breaks? 72 characters are not much - and if the upstream URL is long, the editors will break your nice URL into pieces. This would neither help parsers nor humans, just produces nearly the same work as with the shortened URLs.

You will also not be able to change the thousands of old .changes entries, so you need to inform users about your changes. ...and I know some developers who already worry about the length of the .changes file entries. You can hopefully convince them to use your new, full-URL instead of the shortened form.

Do you need more arguments against your proposal?

Instead an alternative idea: why not patch tar_scm to shorten the URLs for you and bring them into a form that is usable for everyone?

@lrupp commented on 26 Jan 2018, 12:51 GMT:

I completely agree with GarySmith that things in the universe tend to change (just think about the problem free OpenStack upgrades that do never require any change in running instances... ;-).

Of course things in OpenStack and the rest of the universe tend to change, but that is not very relevant to this discussion about one very specific topic (issue tracker URLs).

It's a fact that especially openSUSE's bugtracker is available under different URLs already. At the moment, I can use https://bugzilla.{novell,suse}.com or https://bugzilla.opensuse.org/ to get to "my" bug report.

Right, and there are different abbreviations for these too: bsc, bnc, and boo.

Just the layout changes. There were already discussions in the past the split out at least the openSUSE part to an own system - avoiding that people need to register themselves at Novell if they just want to report a bug against openSUSE...

Sure, but what is your point here?

And you might open a can of (security) worms here, as parsing URLs is not always easy and an attacker might use a URL in such a form to redirect tools and humans to his ugly stuff (and: people tend to click on URLs much quicker than on shortened forms).

Sorry, please can you clarify a few things because I'm having trouble understanding your logic here.

Firstly, please can you clarify which components do you envisage as needing to parse these URLs, and why? You seem to be using a generalisation ("parsing URLs is not always easy") and applying it to a very specific scenario. The generalisation in itself seems a bit dubious to me, since there are lots of libraries which make this very easy. But even if I conceded that point, I still don't understand how it could possibly be argued that parsing of URLs such as

https://bugzilla.suse.com/show_bug.cgi?id=1234567

is difficult.

If the tools have the URL they refer to in their code, someone might still think about doing crazy things with the shortened form, but it's up to the parser (or human) to follow this stuff or not.

Which tools, and which URL? Which someone, and which crazy things? Sorry if these sound like stupid questions, but I'm struggling to understand your point here without more context.

I also don't think that the people who wrote all the parsers will be happy with your approach.

Just some examples: https://download.suse.com/ => enter a bug number in the "Keywords" field to see if a bug is referenced in a maintenance update; https://build.opensuse.org/search allows explicitly to search for bugs mentioned in changes files.

So you need to convince the developers of (all) those systems to rewrite their .changes parsers...

Hmm, I'm afraid you didn't read my proposals very carefully. In "Proposed solution 1", the .changes format would not change at all, so it would be backwards-compatiable and no rewrites would be required. In "Proposed solution 2", I wrote "some tooling may depend on the old, shortened form", so you are somewhat reiterating what I already wrote.

It may be true that my second proposal would require some of the existing .changes parsers out there to be extended so that they can understand a specific set of bug URLs (assuming that they don't already). That sounds like a relatively small set of simple engineering tasks to me; I imagine the main challenge would be getting everyone involved to agree to the change. Subsequently coordinating it so that nothing broke should be easy - we could simply wait until all the parsers added support for the new format, and then update tar_scm accordingly.

Another question: what happens with line breaks? 72 characters are not much - and if the upstream URL is long, the editors will break your nice URL into pieces. This would neither help parsers nor humans, just produces nearly the same work as with the shortened URLs.

You will also not be able to change the thousands of old .changes entries, so you need to inform users about your changes. ...and I know some developers who already worry about the length of the .changes file entries. You can hopefully convince them to use your new, full-URL instead of the shortened form.

Do you need more arguments against your proposal?

Against which proposal? I gave two, and as far as I can see you missed the first and only argued against the second. What I really need is more context / clarification so that I can understand your arguments and decide whether I agree with you.

Instead an alternative idea: why not patch tar_scm to shorten the URLs for you and bring them into a form that is usable for everyone?

Uhhh... that's not an alternative idea, it's exactly what I described in "Proposed solution 1". Quoting verbatim from above:

Make tar_scm recognise reference lines in git commit messages utilising the above format, and when auto-generating .changes entries, automatically convert them to the shortened form which complies with the current .changes packaging policy.

openSUSE / obs-service-tar_scm