Semver compliance - Githubissues

umarcor commented 3 years ago

Ref #33 #36

Currently, versions have four numbers/fields. I guess they represent relevance from left to right. So, changes in the first field imply breaking changes, the second implies enhancements or additions, the third one represents bugfixes and the last one is "hidden" work (added to the codebase but not enabled by default)?

Unfortunately, such format is not semver compliant. See https://semver.org/. Precisely, https://semver.org/#is-there-a-suggested-regular-expression-regex-to-check-a-semver-string, which is implemented in the following test:

import re

def testSemVer(version):
    print(f"{version} ", end='')
    rexp = r"^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<prerelease>(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+(?P<buildmetadata>[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$"
    semver = re.search(rexp, version)
    if semver == None and version[0] == 'v':
        semver = re.search(rexp, version[1:])
    tag = version
    if semver == None:
        print('! Could not get semver from %s' % version)
    else:
        if semver.group('prerelease') is None:
            print("! Release")
        else:
            print("! Prerelease")

testSemVer("v1.0.0")
testSemVer("v1.0.0.0")
testSemVer("v1.0.0rc1")
testSemVer("v1.0.0-rc1")

# python testsemver.py 
v1.0.0 ! Release
v1.0.0.0 ! Could not get semver from v1.0.0.0
v1.0.0rc1 ! Could not get semver from v1.0.0rc1
v1.0.0-rc1 ! Prerelease

Being semver compliant would make it easier to reuse existing libraries for bumping, comparing, managing... versions. However, there are known projects which are not exactly compliant, or which use a double format. For instance, MSYS2 uses pacman (the same as Arch Linux) and vercmp: https://archlinux.org/pacman/vercmp.8.html. There, it's common to find versions such as 1.0.0.r257.g82665d42-1. As you might expect, those are created from a semver compliant tag 1.0.0, plus the number of commits since that tag r257, plus the commit SHA prepended with g. The last item, -1 is the pkgrel field, for forcing rebuilds.

Since you are already tagging *.0 versions only, it would be easy to change into removing the last field from the tags. Then, the "long version" would be generated automatically, without hardcoding it in the sources. It's up to you to append r257 only, the SHA only, or both of them.

For reference:

# git describe --long --tags  | sed 's#\([^-]*-g\)#r\1#; s#-g#.g#g;'
v1.5.5.0-r89.ga18c153
# NOT semver compliant

So, changing from v1.5.5.0 to v1.5.5:

# git describe --long --tags
v1.5.5-89-ga18c153
# semver compliant

# git describe --long --tags  | sed 's#\([^-]*-g\)#r\1#;'
v1.5.5-r89-ga18c153
# semver compliant

# git describe --long --tags  | sed 's#\([^-]*-g\)#r\1#; s#-g#.g#g;'
v1.5.5-r89.ga18c153
# semver compliant

I know that changing the versioning format can be painful. Yet, if it needs to be discussed, better do it now than later. Once it is made explicit (#33) it will be worse to change.

stnolting commented 3 years ago

Currently, versions have four numbers/fields. I guess they represent relevance from left to right. So, changes in the first field imply breaking changes, the second implies enhancements or additions, the third one represents bugfixes and the last one is "hidden" work (added to the codebase but not enabled by default)?

Sort of... 😆 So far, there is no really concept behind the version number - with the exception that the version number increments with each RTL modification.

So I would be happy to have a "real versioning concept" here (SemVer). 👍

stnolting commented 3 years ago

I am reading though the issues and PRs regarding the "versioning topic" right now and I think I am a little bit lost ("meaningful" versioning is quite new to me - unfortunately) 😅

Maybe it is a stupid question, but when does the version - let's call it number - gets "increased"? This should be with every new git tag, right? :thinking:

umarcor commented 3 years ago

("meaningful" versioning is quite new to me - unfortunately) 😅 Maybe it is a stupid question, but when does the version - let's call it number - gets "increased"?

It's ok. That's absolutely normal... because there is no answer 🤣

See the first paragraph of https://semver.org/#introduction (bold added by me):

In the world of software management there exists a dreaded place called “dependency hell.” The bigger your system grows and the more packages you integrate into your software, the more likely you are to find yourself, one day, in this pit of despair.

semver is for software packages. For an API, an endpoint, a CLI, a library... anything where you can tell what is exported and what is private and where you can have a defined set of "whatever" that are relevant to consumers/users.

So, what is THE API in NEORV32? There is not one only, there are lots. Potentially, each module is an API, and the software for each module is another API. Since only some users will consume certain components, you could version all of them independently. Right after doing that, you would go crazy, because you would spend more time managing that than doing actual development.

Therefore, you can not solve the problem with a single version. You/we need to assume that you will let everyone know about every change you do to any of the underlying APIs (currently you bump when hardware APIs are modified), or that you will be specific about the changes in the changelog (or in commit messages), so that the version itself is not so meaningful.

GHDL is tagged once a year, typically in february. Some repos in YosysHQ have not been tagged in years. Still, people use them and they are packaged. Precisely, packagers are the people who care about versions most. As a packager of a project which I don't watch every day, when I want to bump it, I need to review the state of the master/development branch and decide whether it's in a clean state, or to go back. Therefore, it's very valuable if developers mark some commits as "safe to be packaged". That is communicated by tagging.

In this context, we use semver because it allows reusing existing tools for extracting each field, identifying whether it's a pre-release (to be tested but not distributed), etc. So, we care about syntax, not so much about the semantics. My vision as a packager is:

MAJOR: this might be hard to bump...
MINOR: let's see if they added some new dependency or they dropped some deprecated/unnecesary.
PATCH: have a quick look, but don't care.

As a user, it's different. I want to always check the changelog, because they might have changed some specific feature which is irrelevant for most users, but which I use.

This should be with every new git tag, right? 🤔

You can tag pre-releases explicitly too. In fact, the idea of the identifiers generated in CI and used in the docs is that those can be parsed as pre-releases.

The point is that you should tag a commit whenever you want someone to have a better look at that one, than to any of the previous or upcoming states of the branch/repo. Depending on how important it is for them to review the content, you use pre-releases or releses.

NEORV32 is not packaged per se. And it does not make much sense, a priori, to have it available through apt, dnf, pacman... So, the question is: who are your consumers? how do they consume this project? Currently, the relevant outcome of tags are:

Having "sections" in the changelog which people can read as digests/summaries.
Having "frozen" datasheets.

BTW, I just saw the we need to fix the CI and properly append the release name to the assets when a tagged commit is pushed. eine/tip did correctly upload them, but the workflow has nightly hardcoded when renaming them. Furthermore, revnumber should use the tag only, if that's a tagged commit. It used v1.5.6.0-r0-g2723525, which is correct but redundant.

Moreover, v1.5.6.0 is not semver compliant. It needs to have three fields :wink:. Since you are already tagging *.0 versions only, I suggest you drop that from the tag. You can keep using it in the changelog, so that you don't modify your workflow. I believe that is the easiest modification.

I'm not sure about your criteria for bumping the third (PATCH) field, but you've been tagging one a month more or less. I find that to be ok, and the changelog is really nice to read. I don't think you need to change anything.

stnolting commented 3 years ago

Holy cricket. I have absolutely no idea. There are so many aspects that are somehow relevant. But at the same time, many of those aspects might perfectly fit pure-software projects and but are somewhat odd when applied to a mixed HW+SW project....

It would be OK for me to completely ignore the software part of this project for versioning and focus on the hardware part only. But as soon as we are talking about the low-level functions from the SW framework, the software part is back in the versioning game.

I think I am already lost in versioning hell so I am grateful for any kind of help. :exploding_head: We could just try a certain approach and see how it evolves. Regarding the current versioning scheme: We could also do a "hard reset" and use a completely different versioning scheme for the future. I'm open for any "straight forward" approach as I am starting to realize the importance of general versioning and also the flaws of the current versioning.

I'm not sure about your criteria for bumping the third (PATCH) field, but you've been tagging one a month more or less. I find that to be ok, and the changelog is really nice to read. I don't think you need to change anything.

I am not sure about that either 😅

umarcor commented 3 years ago

You seem to be a well-organised and structured guy with regard to the development. Hence, I can understand your frustration with not being able to know exactly how to do it. Honestly, do not worry! As said, you are already doing a good job with communicating the changes! Just keep doing the same.

The only change is that you tag the next release (next month) v1.5.7, NOT v.1.5.7.0. Interestingly, next month is July and that's the 7th month. So, you might decide to start incrementing the third field once a month, matching the month number. But, honestly, don't worry about it! It's not important!

The most important is:

Use a semver compliant syntax (ignore semantics).
Do not "abuse" by creating a tag each day or each week, unless there is an important reason for that.

Other than that, again, you are already communicating the changes very nicely!

stnolting commented 3 years ago

You seem to be a well-organised and structured guy with regard to the development.

Well, that's relative - but thank you anyway 😄

The only change is that you tag the next release (next month) v1.5.7, NOT v.1.5.7.0. Interestingly, next month is July and that's the 7th month. So, you might decide to start incrementing the third field once a month, matching the month number. But, honestly, don't worry about it! It's not important!

Ok, so put simple: That means we are only using the first 3 places (major, minor, patch) of the current "hardware version number", right? The least-significant decimal place (1.2.3.X) is just some kind of "serial number"?!?

Can you think of any guide line, when to to upgrade the major/minor/path number? I mean, for software frameworks this might be kind of straight forward... But in this case? Anyway, this might not be the most important thing right now.

I really need to get a grip on this 😅 Adding further "revisions" (-> CHANGELOG.md) and making new releases won't make things easier.

umarcor commented 3 years ago

Can you think of any guide line, when to to upgrade the major/minor/path number?

I would suggest:

Do not change it unless you do some very fundamental modification to the conception of the project, such as a complete rewrite. Otherwise, keep v1 "forever". GHDL has been in v0 for almost 20 years.
This is what you might want to use for communicating "ey, this last update is more important than other regular updates". So, when you add some new peripheral/subsystem, when you support some new instruction, when you add a cache, when you make some "hardcoded" component be generic, when you remove deprecated features, etc.
This is mostly driven by time: you want to tag once a month or once every few months, so the changelog is not huge.
If you want, you can keep doing as until now: bump whenever you change "sources", but not when you update docs, CI or other parts of the repo.

Alternatively, you might stop manually using the last number. Instead, use something such as 1.2.3-r000. That r000 is something you can get from git (it's the number of commits between your current HEAD and the latest tag). It's precisely the format used in the docs at the moment (removing the commit sha). So, you don't need to keep track of it. Whenever you want to add an entry to the changelog, you get that value from branch master. Moreover, this is "backwards compatible". That is, you can rewrite column "Version" in the changelog for using this format, and it will be consistent. On top of that, commits have a date, so you don't need the "Date" column in the changelog, that can be extracted from the e.g. 1.2.3-r010 (tenth commit ahead of tag 1.2.3 in branch master). I think you can use an script for applying this criteria to the existing changelog, do not even consider doing it manually.

stnolting commented 3 years ago

This can be closed now, right? 😉

stnolting / neorv32

Semver compliance #37