Closed DavidFichtmueller closed 5 years ago
+1 for adding to docs. Exactly what is was looking for. Thank you!
Would be nice to have this in the official SemVer specification !
If I'm not mistaken
\d*[a-zA-Z-][0-9a-zA-Z-]*
is the same as
[0-9a-zA-Z-]+
(note the plus)
No, the first one requires that there be at least one character from the range [a-zA-Z- ]
since it has no wildcard. However, the second one does not require at least one char from that range. The string 12345
fails the first, while it passes the second.
Right, but it is part of
0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*
So "12345" wil pass anyway.
Edit: if it is changed as in my previous comment, leading zero's will pass. I'm sorry ;)
If this isn't part of an actual implementation, it should use only (POSIX) standard features, like [[:digit:]]
in place of \d
.
Perhaps you should get into touch with gulp-bump and ask them for using the to be established official regex:
https://github.com/stevelacy/bump-regex/blob/f70505708719e2f9b7df44e0521145907349b761/index.js#L28
and export that regex for reuse purposes, e.g. scanning through files and replacing occurrences within these files.
There is also https://github.com/sindresorhus/semver-regex, perhaps you want to join efforts here.
And there is also https://github.com/semver/semver.org/issues/59. Get your gears straight.
@DavidFichtmueller, near as I can tell, if you want it included in or linked from the semver.org site, you really should contribute to semver/semver.org#59.
Please close this issue at your earliest possible convenience.
@DavidFichtmueller, I just ran your regex against the test data set we've been using over on semver/semver.org#59 and it correctly matches all the positive examples while rejecting all of the negative examples, including one particularly degenerate version string that causes excessive back-tracking in many of the proposed regex's. If you could add named capture groups, this would be awesome.
Hi @DavidFichtmueller ,
I just made a comparision of your definition)%3F(%5C%2B%5B0-9a-zA-Z-%5D%2B(%5C.%5B0-9a-zA-Z-%5D%2B))%3F%24) against mine(%5C.(0%7C%5B1-9%5D%5Cd%7C(%5B0-9A-Za-z-%5D%5BA-Za-z-%5D%5B0-9A-Za-z-%5D))))%3F(%5C%2B%5B0-9A-Za-z-%5D%2B(%5C.%5B0-9A-Za-z-%5D%2B)*)%3F%24).
Your case achieves the full test case in 2345 steps which is more than twice as fast as mine - 5015 steps! :)
Besides reducing the number of groups to the minimum (which has to be verified for all the implementations - some could require some extra groups) and substituting /[0-9]/
by /\d/
(which not all implementations accept), you achieved a greater simplification going
from /[0-9A-Za-z-]*[A-Za-z-][0-9A-Za-z-]*/
to /\d*[a-zA-Z-][0-9a-zA-Z-]*/
.
Adding named groups and non-capturing groups decreases performance to 2501 steps.
See it here.
@gvlx thanks for the comparison, that is really interesting. Also you were quicker than me, creating the version with the named groups (I also added the non-capturing groups and pretty much got the same result as you with the 2501 steps).
@jwdonahue here is the version as requested with the named groups:
^(?<major>0|[1-9]\d*)\.(?<minor>0|[1-9]\d*)\.(?<patch>0|[1-9]\d*)(?:-(?<prerelease>(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+(?<buildmetadata>[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
and here without the named groups (for the systems that don't support it yet), but with the non capturing groups (so cg1 = major, cg2 = minor, cg3 = patch, cg4 = prerelease and cg5 = buildmetadata):
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
@DavidFichtmueller, do you have time to issue a pull request? I am thinking we should add an "Is there an efficient regex?" to the FAQ and include those in the answer. If you don't have time, I might get to it this week-end.
All of these Regular expressions don't parse the version string strictly according to the SemVer spec. The Regular expressions shown thus far in this issue allows for the following invalid version strings:
0.0.0-0.abc---+0.build-4.003-2
0.0.1-pre-release.tag+build.metadata
These are invalid because the version numbers 0.0.0
and 0.0.1
are invalid. The minimum version number allowed by the spec is 0.1.0
.
I had developed my own Regex a while back, and the RegEx @DavidFichtmueller wrote above was very close to what I had. My Regular Expression introduces a Zero-Width Positive Look-Behind group in the (?<Minor>)
portion of the regular expression. This may make the RegEx more inefficient, but it depends on what your goal is: speed vs. correctness. Here's that part, shown with extra spacing to help things stand out:
(?<Minor> (?<=[1-9]\d*\.) 0 | [1-9]\d* )
This says "Match a 0
if the 0
is preceded with a digit 1-9 followed by zero or more digits followed by a period, otherwise match a digit 1-9 followed by zero or more digits".
In addition, I added the (?n)
to the beginning of the regular expression so that only named capture groups are captured so that I don't have to use (?: ... )
for non-captured groups. Finally, I also provided two extra named capture groups to enable you to capture the Pre-release and Build-metadata tags with and without the leading -
and +
(respectively)--but those extra captures can be easily be removed.
Here's a copy/paste friendly version of the regular expression in its entirety:
(?nx)^
(?<Major>0|[1-9]\d*)\.
(?<Minor>(?<=[1-9]\d*\.)0|[1-9]\d*)\.
(?<Patch>0|[1-9]\d*)
(?<PreReleaseTagWithSeparator>
-(?<PreReleaseTag>
((0|[1-9]\d*|\d*[A-Z-a-z-][\dA-Za-z-]*))(\.(0|[1-9]\d*|\d*[A-Za-z-][\dA-Za-z-]*))*
)
)?
(?<BuildMetadataTagWithSeparator>
\+(?<BuildMetadataTag>[\dA-Za-z-]+(\.[\dA-Za-z-]*)*)
)?$
And here's the one-liner version:
(?n)^(?<Major>0|[1-9]\d*)\.(?<Minor>(?<=[1-9]\d*\.)0|[1-9]\d*)\.(?<Patch>0|[1-9]\d*)(?<PreReleaseTagWithSeparator>-(?<PreReleaseTag>((0|[1-9]\d*|\d*[A-Z-a-z-][\dA-Za-z-]*))(\.(0|[1-9]\d*|\d*[A-Za-z-][\dA-Za-z-]*))*))?(?<BuildMetadataTagWithSeparator>\+(?<BuildMetadataTag>[\dA-Za-z-]+(\.[\dA-Za-z-]*)*))?$
Hi,
I went back to read the specification and couldn't find the constraint on using a 0.0.0
version.
Are you sure of this?
@fourpastmidnight said:
These are invalid because the version numbers 0.0.0 and 0.0.1 are invalid. The minimum version number allowed by the spec is 0.1.0.
Incorrect. See #4 in the spec. The relevant FAQ suggests that starting at 0.1.0 is a good idea, but it's not required. It's a common practice to shift the semantic meaning of the fields one to the right for the 0.y.z series versions such that a bump of the minor field is a breaking change bumps in the patch version can be bug fixes and new features.
Your regex fails to validate the compliant version string 0.0.4. Here's the [so far] agreed test set:
0.0.4 1.2.3 10.20.30 1.1.2-prerelease+meta 1.1.2+meta 1.1.2+meta-valid 1.0.0-alpha 1.0.0-beta 1.0.0-alpha.beta 1.0.0-alpha.beta.1 1.0.0-alpha.1 1.0.0-alpha0.valid 1.0.0-alpha.0valid 1.0.0-alpha-a.b-c-somethinglong+build.1-aef.1-its-okay 1.0.0-rc.1+build.1 2.0.0-rc.1+build.123 1.2.3-beta 10.2.3-DEV-SNAPSHOT 1.2.3-SNAPSHOT-123 1.0.0 2.0.0 1.1.7 2.0.0+build.1848 2.0.1-alpha.1227 1.0.0-alpha+beta 1.2.3----RC-SNAPSHOT.12.9.1--.12+788 1.2.3----R-S.12.9.1--.12+meta 1.2.3----RC-SNAPSHOT.12.9.1--.12 1.0.0+0.build.1-rc.10000aaa-kk-0.1 99999999999999999999999.999999999999999999.99999999999999999 1.0.0-0A.is.legal Begin Invalid
1 1.2 1.2.3-0123 1.2.3-0123.0123 1.1.2+.123 +invalid -invalid -invalid+invalid -invalid.01 alpha alpha.beta alpha.beta.1 alpha.1 alpha+beta alpha_beta alpha. alpha.. beta\ 1.0.0-alpha_beta -alpha. 1.0.0-alpha.. 1.0.0-alpha..1 1.0.0-alpha...1 1.0.0-alpha....1 1.0.0-alpha.....1 1.0.0-alpha......1 1.0.0-alpha.......1 01.1.1 1.01.1 1.1.01 1.2 1.2.3.DEV 1.2-SNAPSHOT 1.2.31.2.3----RC-SNAPSHOT.12.09.1--..12+788 1.2-RC-SNAPSHOT -1.0.3-gamma+b7718 +justmeta 9.8.7+meta+meta 9.8.7-whatever+meta+meta 99999999999999999999999.999999999999999999.99999999999999999----RC-SNAPSHOT.12.09.1--------------------------------..12
@fourpastmidnight, I do like that your regex is fairly short and seems to be performant. I think the 0.0.0 version should be added to the top of the test data. It's valid, allows for initializing version objects and frankly, indicates the interface doesn't exist or the developer isn't sure yet whether there will be any "API".
You’re right! I totally inferred that rule from the suggestion in the FAQ, but section 4 specifies no such constraint. In which case, the regexes supplied here, especially the one from @DavidFichtmueller, are correct. My apologies.
Sent from my Windows 10 phone
From: Gerardo Lisboa Sent: Wednesday, October 17, 2018 18:31 To: semver/semver Cc: Craig E. Shea; Comment Subject: Re: [semver/semver] RegEx for validating SemVer-numbers (#232)
Hi, I went back to read the specification and couldn't find the constraint on using a 0.0.0 version. Are you sure of this? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
The only thing that was needed to pass the test concerning the regular expressions above is to remove the Zero-Width Positive Look-Behind group in the (?<Minor> ...)
component of my Regex that I added due to inferring a non-existent constraint based on a suggestion in the FAQ 😉.
So, here's my corrected RegEx, which, aside from the difference in captured group capitalization and captured groups that both include and exclude the leading -
and +
of the pre-release and build metadata tags, is otherwise the same as @DavidFichtmueller's. So, great job by him!
(?nx)^
(?<Major>0|[1-9]\d*)\.
(?<Minor>0|[1-9]\d*)\.
(?<Patch>0|[1-9]\d*)
(?<PreReleaseTagWithSeparator>
-(?<PreReleaseTag>
((0|[1-9]\d*|\d*[A-Z-a-z-][\dA-Za-z-]*))(\.(0|[1-9]\d*|\d*[A-Za-z-][\dA-Za-z-]*))*
)
)?
(?<BuildMetadataTagWithSeparator>
\+(?<BuildMetadataTag>[\dA-Za-z-]+(\.[\dA-Za-z-]*)*)
)?$
And again, the one-liner:
(?n)^(?<Major>0|[1-9]\d*)\.(?<Minor>0|[1-9]\d*)\.(?<Patch>0|[1-9]\d*)(?<PreReleaseTagWithSeparator>-(?<PreReleaseTag>((0|[1-9]\d*|\d*[A-Z-a-z-][\dA-Za-z-]*))(\.(0|[1-9]\d*|\d*[A-Za-z-][\dA-Za-z-]*))*))?(?<BuildMetadataTagWithSeparator>\+(?<BuildMetadataTag>[\dA-Za-z-]+(\.[\dA-Za-z-]*)*))?$
Thanks for pointing out my incorrect assertion @gvlx and @jwdonahue.
As an aside, I found my way here because I was testing a PowerShell function I wrote a few years back that parses a "version string". My original RegEx, in addition to handling SemVer compliant strings, also loosely supports parsing NuGet Legacy style version strings. The only good thing about all of this is, because I independently came up with the same solution as DavidFichtmueller, we can be pretty darn sure it works correctly (when the spec is actually followed 😉 lol). The script I wrote returns a PSCustomObject
to represent the version data and you can inspect the individual components and get string representations of it in SemVer 2.0 or NuGet legacy formats. I even have a comparison script that can compare two versions and return -1, 0, or 1 to indicate whether one version is less than, equal to, or greater than another. I'll end up posting those to my GitHub account in a few days, most likely. I hadn't needed them again until recently and decided to write tests for them, which is what brought me here today. I'm still trying to wrap up the last of the unit tests.
@fourpastmidnight, thank you for your efforts, I look forward to ripping your powershell scripts ;).
If you happen upon any interesting positive/negative oracles to test against, please add them to our list and post them here. I will eventually issue a PR to get them checked-in.
\+(?<BuildMetadataTag>[\dA-Za-z-]+(\.[\dA-Za-z-]*)*)
This allows build metadata to end with a period. To prevent this, change one *
to +
:
\+(?<BuildMetadataTag>[\dA-Za-z-]+(\.[\dA-Za-z-]+)*)
Good catch! Thanks.
Sent from my Windows 10 phone
From: Liam Morland Sent: Monday, April 1, 2019 11:08 To: semver/semver Cc: Craig E. Shea; Mention Subject: Re: [semver/semver] RegEx for validating SemVer-numbers (#232)
+(?
Hi @fourpastmidnight, please check your PreReleaseTag
, I have the impression you are having the same issue as with the BuildMetadataTag
.
RegEX for validating alpha/beta
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-(0|[1-9]\d*|(beta|alpha)[0-9a-zA-Z-]*)(\.(0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*)(\+[0-9a-zA-Z-]+(\.[0-9a-zA-Z-]+)*)*?$
based on
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-(0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(\.(0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*)?(\+[0-9a-zA-Z-]+(\.[0-9a-zA-Z-]+)*)?$
It seems that PR #460 has been completed. We now have a pair of well vetted pair of regex's included in the spec.
Can we please close this discussion now?
@jwdonahue yes, you are right. This can be closed now.
Hi @fourpastmidnight, please check your
PreReleaseTag
, I have the impression you are having the same issue as with theBuildMetadataTag
.
Sorry for the very late reply. I did check into this and there doesn't appear to be the same issue with the pre-release tag portion of my SemVer regex:
PS C:\> $semver = [regex]@'
(?nx)^
(?<Major>0|[1-9]\d*)\.
(?<Minor>0|[1-9]\d*)\.
(?<Patch>0|[1-9]\d*)
(?<PreReleaseTagWithSeparator>
-(?<PreReleaseTag>
((0|[1-9]\d*|\d*[A-Z-a-z-][\dA-Za-z-]*))(\.(0|[1-9]\d*|\d*[A-Za-z-][\dA-Za-z-]*))*
)
)?
(?<BuildMetadataTagWithSeparator>
\+(?<BuildMetadataTag>[\dA-Za-z-]+(\.[\dA-Za-z-]*)*)
)?$
'@
PS C:\> $semver.IsMatch('1.0.0-alpha1.')
False
PS C:\>
@fourpastmidnight, you cannot have empty identifiers in the prerelease and meta tags. In other words, the trailing dot on your "-alpha1." is not legal because it creates an empty identifier.
Hm, well, I found another error too, while validating my regex—a prerelease tag of -00
was considered valid but it’s not valid according to the spec. So, I made some changes to the tags portion that align with the BNF grammar and now everything works. I’ll post my updated regex later tonight, just because—and finally look to post that set of PowerShell scripts I talked about a year ago!
Sent from my Windows 10 phone
From: Joseph Donahue Sent: Saturday, October 26, 2019 14:59 To: semver/semver Cc: Craig E. Shea; Mention Subject: Re: [semver/semver] RegEx for validating SemVer-numbers (#232)
@fourpastmidnight, you cannot have empty identifiers in the prerelease and meta tags. In other words, the trailing dot on your "-alpha1." is not legal because it creates an empty identifier. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
@jwdonahue:
Oh, I see now, you were a bit confused by my previous post. The post was showing that my earlier regex did indeed NOT match -alpah1.
as shown by the False
output from $semver.IsMatch('1.0.0-alpha1.
).
In any event, as stated above, I did find that my regex did allow for an invalid pre-release tag containing leading zeros. But thanks to the very well written BNF notation in the spec, I have been able to resolve that issue and along the way refine the regex a bit more.
I'll post it here later with timing results from Regex101.com.
OK, so here's my new, corrected, and improved Regex for parsing/validating SemVer 2.0 version strings. It is up to 24% faster than the regex's listed on the homepage of SemVer.org.
In addition, when validating the below Regex with the above list of valid/invalid SemVer versions, one of the invalid Build Metadata version regexes was also listed as valid during my initial remediation efforts (again due to using \d*
instead of \d+
--you'd think I've learned by now 😉 ): 1.1.2+.123
. I then realized that the list above does not contain a similar invalid pre-release tag: 1.1.2-.123
. Luckily, that one was flagged as invalid. Additionally, I found another missing invalid case: 1.1.2+0.
; and during my remediation efforts, this was also matching (again due to a \d*
instead of \d+
). I've added this to the list of Invalid SemVer version strings.
The Regex below matches the entire list, 31 matches, in 2058 steps which is 460 steps (+18.27%) faster than the Regex at the SemVer.org homepage. (Note that for Regex101, the .NET (?n)
modifier is not supported and you need to replace all unnamed capture groups with (?>...)
to "exclude" them from capture to achieve the best performance. The regex shown below is the .NET version. Follow the link to Regex101 for a PCRE version.)
(?inx)
^
(?<Major>0|[1-9]\d*)\.
(?<Minor>0|[1-9]\d*)\.
(?<Patch>0|[1-9]\d*)
(?<PreReleaseTagWithSeparator>-(?<PreReleaseTag>([a-z-][\da-z-]+|[\da-z-]+[a-z-][\da-z-]*|0|[1-9]\d*)(\.([a-z-][\da-z-]+|[\da-z-]+[a-z-][\da-z-]*|0|[1-9]\d*))*))?
(?<BuildMetadataWithSeparator>\+(?<BuildMetadata>[\da-z-]+(\.[\da-z-]+)*))?
$
And here's the one-liner:
(?in)^(?<Major>0|[1-9]\d*)\.(?<Minor>0|[1-9]\d*)\.(?<Patch>0|[1-9]\d*)(?<PreReleaseTagWithSeparator>-(?<PreReleaseTag>([a-z-][\da-z-]+|[\da-z-]+[a-z-][\da-z-]*|0|[1-9]\d*)(\.([a-z-][\da-z-]+|[\da-z-]+[a-z-][\da-z-]*|0|[1-9]\d*))*))?(?<BuildMetadataWithSeparator>\+(?<BuildMetadata>[\da-z-]+(\.[\da-z-]+)*))?$
If you don't need the capture groups (i.e. you're only interested in validating a SemVer version string or capturing the entire string without needing the breakdown of the various parts of the string), then you can use this better performing version of the Regex which performs all the matches in 1905 steps, which is 613 steps (+24.34%) faster than the non-capturing regex on the SemVer.org homepage (which, oddly enough, has the same number of steps as the capturing one)--again, this is the .NET version:
(?inx)
^
(0|[1-9]\d*)\.
(0|[1-9]\d*)\.
(0|[1-9]\d*)
(-([a-z-][\da-z-]+|[\da-z-]+[a-z-][\da-z-]*|0|[1-9]\d*)(\.([a-z-][\da-z-]+|[\da-z-]+[a-z-][\da-z-]*|0|[1-9]\d*))*)?
(\+[\da-z-]+(\.[\da-z-]+)*)?
$
And once again, the one-liner:
(?in)^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-([a-z-][\da-z-]+|[\da-z-]+[a-z-][\da-z-]*|0|[1-9]\d*)(\.([a-z-][\da-z-]+|[\da-z-]+[a-z-][\da-z-]*|0|[1-9]\d*))*)?(\+[\da-z-]+(\.[\da-z-]+)*)?$
I recently came back to this because I was creating a cmdlet for adding a Channel Version Rule to an Octopus Deploy deployment process resource. Anyway, hopefully, I can post my PowerShell scripts soon, along with attendant tests.
How about a numeric only regular expression based on the official suggestion? For example, 1.2.3
should match, but both 1.2.3-rc1
and 1.2.3-alpha1
should not match.
I am using the following numeric only regular expression.
^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)$
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)$
Please correct me if I misunderstood the expressions.
The above regex mention by @DavidFichtmueller failed for this condition
you can use the below regex
'^(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)(-((0|[1-9][0-9]*|[0-9]*[a-zA-Z-][0-9a-zA-Z-]*)(\.(0|[1-9][0-9]*|[0-9]*[a-zA-Z-][0-9a-zA-Z-]*))*))?(\+([0-9a-zA-Z-]+(\.[0-9a-zA-Z-]+)*))?$'
@slayer321 : can you please specify which of the regexes mentioned by me, fails for this case? I double checked them and it worked for all (the regex itself has evolved over the course of this discussion). But most importantly, it works for the two expressions mentioned on the website: https://semver.org/#is-there-a-suggested-regular-expression-regex-to-check-a-semver-string (use the two regex101.com-links for the test environment with the expected test cases, you can add "0.10.1" to it, to double check it).
I just created a RegEx to check if a version number is a valid Semantic Version Number according to the specification.
Maybe this is useful for some people out there or it could be added to the documentation.