repology / repology-updater

Repology backend service to update repository and package data
https://repology.org
GNU General Public License v3.0
494 stars 173 forks source link

[discussion] unified scheme for snapshot versions #345

Open AMDmi3 opened 6 years ago

AMDmi3 commented 6 years ago

TL;DR: see Summary below

So, we have support for normal versions, and we now also have a special support for prerelease versions. However, we still have to ignore a lot of packages most of which are snapshots. Sometimes snapshots are necessary evil and cannot be avoided. For example, if the release has fatal bug, or when upstream is dead, but there are useful commits in the master branch.

Now I wonder if repology can improve the situation by suggesting some kind of unified unambiguous snapshot format, so snapshot versions from different repos COULD be comparable.

The ideas on format:

Well, I don't see many choices on a format here, it's obviously 1.2.3somethingYYYYMMDD (or somethingYYYYMMDD when there's no past version). From repology point of view, it's the same as 1.2.3.somethingYYYYMMDD, so distros may use additional dot on their discretion.

So, we have to decide what to use as something.

So, either we have to invent a new keyword, which has apparent "post" meaning and is not used upstream, or we could use one of post or patch ignoring their use upstream (which is not that wide). Inventing a keyword seem to be preferable. So, the ideas?

Additional thoughts:

Summary

When packaging snapshots, let's

The version of snapshot which comes after official 4.7 version may thus look like

See how it's better than:

Note that this schema is not something synthetic and new, it's just a refinement of widely used VERSIONwordDATE schema which provides an explicit and unambiguous information on a snapshot which was packaged. As a side affect, it makes it possible for repology to compare these snapshots.

blshkv commented 6 years ago

Nice write up, seems like a proper solution.

I have one comment regarding "guessing" of a next version. Often, 4.8git20170928 is "guessed" based on source code where the author has changed it from the last release 4.7 and it is reflected using --version parameter or displayed when you run it. I agree that there is still no guarantee that a next version will be called 4.8 but there is a hope that it will be not below that version at least. 4.7post20170928 is more universal and straightforward solution for this problem although the "official" version might be higher

AMDmi3 commented 6 years ago

Well, having next version explicitely defined in the upstream code/documentation justifies using 'pre' somewhat, but there still is no guarantee that another version will not be released instead, messing everything up. "Post" way is bulletproof though.

AMDmi3 commented 6 years ago

I've just ran into post suffix used in actual official version:

https://pypi.python.org/pypi/flake8-builtins/1.0.post0

Which makes me think that the only option is really verbose unique suffix such as V.V.VpostsnapshotYYYYMMDD

blshkv commented 6 years ago

I think you should take any standard version scheme and normalise all software to it. Software authors have way too many different creative ideas how to call their releases.

AMDmi3 commented 6 years ago

It is not possible.

davidak commented 6 years ago

What about using YYYY-MM-DD as a more human readable date format? Just 2 more characters, but way more readable.

or somethingYYYYMMDD when there's no past version

Do we really need something in that case?

We in nixpkgs often use just YYYY-MM-DD. We should use soemthing like post when there is a past version, but is there any problem for snapshot versions?

What if you want to package a second change at one day? Maybe YYYY-MM-DD-1?

@AMDmi3 do you plan to create a page with suggestions for package maintainers like this?

I think this is a great initiative to align versioning in software repositorys! Have you invited the packaging community of the major repositorys? Some more opinions and ideas might be helpful and they might be more willing to adopt this when they had a chance to participate in the discussion.

blshkv commented 6 years ago

I disagree with "-" ideas. For long file names it might break to a second line in some WM and it will become unreadable. Also, these two extra chars do no bring any value.

davidak commented 6 years ago

these two extra chars do no bring any value

It brings the value that it is more readable to humans #accessibility

Also, i just randomly found this XKCD comic about ISO 8601 again.

iso_8601

blshkv commented 6 years ago

Well yeah, but i feel like you didn't read my reasons. We are talking about version numbers, not about date standards

AMDmi3 commented 6 years ago

What about using YYYY-MM-DD as a more human readable date format? Just 2 more characters, but way more readable.

I doesn't have to be readable (though I don't see any readability problems with YYYYMMDD), it must be simple, unambiguous and close to schemes which are already widely used.

repology=> select count(*) from packages where version ~ '20[0-9]{2}-[0-9]{2}-[0-9]{2}';                                                                                                                                                                             
 count                                                                                                                                                                                                                                                               
-------                                                                                                                                                                                                                                                              
  1859                                                                                                                                                                                                                                                               
(1 row)                                                                                                                                                                                                                                                              

repology=> select count(*) from packages where version ~ '20[0-9]{6}';                                                                                                                                                                                               
 count                                                                                                                                                                                                                                                               
-------                                                                                                                                                                                                                                                              
 66379                                                                                                                                                                                                                                                               
(1 row)                                                                                                                                                                                                                                                              

Also, some repositories do not support dashes in versions.

or somethingYYYYMMDD when there's no past version

Do we really need something in that case?

Yes, because when the actual version is released, it would automatically be ordered after somethingYYYYMMDD, but not YYYYMMDD, and for the uniformity sake.

Actually, I've just found out that from libversion perspective something1 is less than 0something1, while I'd expect them to be equal. May be related to repology/libversion#14, but anyway we may want to require 0something to make it miscomparison-proof and less ambigous. Or no, depending on how we and others do/want to handle versions like alpha1 (see below).

We in nixpkgs often use just YYYY-MM-DD. We should use soemthing like post when there is a past version, but is there any problem for snapshot versions?

These cases should not be separated, as the proposed snapshot scheme must coexist with past and future versions. Any scheme without prefix will break as soon as first official version is released. So the proposal is to treat all snapshots based on some upstream version, 0 if there isn't one. Actually even that will break if upstream releases e.g. alpha1, unless something is treated very specially (everywhere) which I'd like to avoid, to make the scheme usable with any generic version comparison algorithm, even not as elaborate as libversion.

What if you want to package a second change at one day? Maybe YYYY-MM-DD-1?

That's a very good question. Naive answer would be YYYYMMDD.1, but that can no longer be compared across different repositories. It seems to me that it can't be solved with the scheme at all, as any local suffix will break cross-repository comparison, and complicating the scheme by adding more time resolution would hinder its adoption.

Actually, most repositories have local package revisions which could be used for this purpose. I guess the scheme should suggest using revisions, while libversion could handle snapshots specially and ignore everything past the date. This is OK, since the special handling would only be required in libversion, all local algorithms will still be OK with handling suffixes normally.

@AMDmi3 do you plan to create a page with suggestions for package maintainers like this?

I think this is a great initiative to align versioning in software repositorys! Have you invited the packaging community of the major repositorys? Some more opinions and ideas might be helpful and they might be more willing to adopt this when they had a chance to participate in the discussion.

Not yet. I'm sure this topic will come up when repology is used by more people.

AMDmi3 commented 5 years ago

Returning to this, alternative solution would be for individual repositories to convey information on that they are packaging a snapshot. As soon as we have this flag and a snapshot date, we could compare snapshots specially by comparing dates instead of versions.

It could be further improved:

After repology/repology-rules#20 is done (not even started yet), we'll have all snapshots which use date version marked up, so we can extract this information from them. However if any repository wishes to convey this information directly, it's most welcome and could be used right away.

Repology would need, roughly,

There are multiple ways to convey this data. The simpliest one would be to just use date suffix to the version (1.2.3.20190101) like most repositories already do (however it needs to be used consistently) and introduce a snapshot flag. This would be enough for Repology to handle snapshots consistently.

blshkv commented 5 years ago

Gentoo has a very clear policy: https://wiki.gentoo.org/wiki/Project:ComRel/Developer_Handbook/Ebuild_policy

foo-x.y_preYYYYMMDD.ebuild
foo-x.y_pYYYYMMDD.ebuild

BUT ;-) there is an exception when the upstream did not release any version and x.y is not specified. In this case, the foo-YYYYMMDD.ebuild is used. I could not find any place for the "snapshot" flag.

So as a generic rule, you can search for the suffix YYYYMMDD.ebuild

AMDmi3 commented 5 years ago

Gentoo's policy is no better than other repositories using random suffixes - it mixes up with upstream versions using p with snapshots, it allows pre with nonexisting upstream versions, and YYYYMMDD is indistinguishable from upstream versions looking the same way.

mikhailnov commented 4 years ago

I disagree with "-" ideas. For long file names it might break to a second line in some WM and it will become unreadable. Also, these two extra chars do no bring any value.

And also - is a separator equal to . in RPM, it will split the version in not needed places and lead to part-by-part comparing of components of the date and what goes after it instead of working with the whole date.

ldv-alt commented 4 years ago

FYI, in ALT we promote the following versioning scheme of git snapshots which is based on the idea implemented in https://git.savannah.gnu.org/cgit/gnulib.git/plain/build-aux/git-version-gen (which in turn is used in many projects): If "git describe --abbrev=1" of the upstream commit is VERSION-NUMBER-gHASH, then the package version has to be VERSION.0.NUMBER.HASH . Simples!

AndersonTorres commented 3 years ago

Apologies per necro-bumping!

I will mark myself here, because we at Nixpkgs are struggling at a similar problem. The format I am using is something like x.y.z+unstable=YYYY-MM-DD, however it is still in "brainstorm phase".

(late edits to reflect the current state - thanks @davidak for the reminder)

AMDmi3 commented 3 years ago

@ldv-alt there's a rule back from 2018 which marks that specific scheme as incorrect. Thankfully that scheme hasn't gained wide adoption, as it's horrible in all aspects: not separating upstream and snapshot parts, needlessly long and uses commit hashes. Also violates RPM version policy. Actually, the whole sisyphus is currently pessimized for providing intolerable amount of fake versions (apart from snapshots, for which there's also nothing close to a single format).

@AndersonTorres that's good, but as far as I can see, YYYY-MM-DD scheme is still prevalent.

davidak commented 3 years ago

I think there is still no decision which format should be used in NixOS. It would be great if this issue results in a recommendation.

AMDmi3 commented 3 years ago

The recommendation is in the issue body.

ldv-alt commented 3 years ago

Also violates RPM version policy.

@AMDmi3 Please elaborate.

mikhailnov commented 3 years ago

09.09.2021 01:41, Dmitry V. Levin пишет:

Also violates RPM version policy.

@AMDmi3 https://github.com/AMDmi3 Please elaborate.

+1

AMDmi3 commented 3 years ago

@mikhailnov @ldv-alt

It is mentioned in ALT own docs: https://www.altlinux.org/Spec#Промежуточные_upstream-релизы

It was mentioned in Fedora packaging guidelines, but it turns out it's now thankfully deprecated. https://docs.fedoraproject.org/en-US/packaging-guidelines/Versioning/#_traditional_versioning_with_part_of_the_upstream_version_information_in_the_release_field https://web.archive.org/web/20181211075036/https://fedoraproject.org/wiki/Packaging:Versioning#Prerelease_versions

ldv-alt commented 3 years ago

@mikhailnov @ldv-alt

It is mentioned in ALT own docs: https://www.altlinux.org/Spec#Промежуточные_upstream-релизы

I'm sorry to correct you, but the wiki page you're referencing is not a policy, let alone an RPM policy.

It was mentioned in Fedora packaging guidelines, but it turns out it's now thankfully deprecated. https://docs.fedoraproject.org/en-US/packaging-guidelines/Versioning/#_traditional_versioning_with_part_of_the_upstream_version_information_in_the_release_field https://web.archive.org/web/20181211075036/https://fedoraproject.org/wiki/Packaging:Versioning#Prerelease_versions

I'm sorry to correct you, but the Fedora document you're referencing is not an RPM policy.

Anyway, RPM permits the kind of versioning I recommend for use in case of git snapshots, and ALT packaging policies have nothing against it.

Like it or not, but the versioning scheme I recommend for git snapshots has its benefits and its users. You opposition to this scheme is clear, but I'm respectfully disagree. Anyway, it's up to distros to choose their packaging policies, and ALT has chosen the scheme you don't like. Let's agree to disagree on this subject.

AMDmi3 commented 3 years ago

Well, all I'm going to say is that this scheme will never be honored by Repology because it cannot be meaningfully compared neither to upstream, nor to other repositories, nor to other sources such as vulnerability databases.

AndersonTorres commented 3 years ago

@AndersonTorres that's good, but as far as I can see, YYYY-MM-DD scheme is still prevalent.

I am formulating a RFC to the NixOS community/organization. Until then, the mess will be there.

ldv-alt commented 3 years ago

Well, all I'm going to say is that this scheme will never be honored by Repology because it cannot be meaningfully compared neither to upstream, nor to other repositories, nor to other sources such as vulnerability databases.

Since versions produced by this versioning scheme are as easy to recognize as versions produced by other versioning schemes, I do not agree that they cannot be meaningfully compared with upstream versions, and you do not compare different snapshots between each other anyway.

BTW, how can you explain the following: https://repology.org/project/hasher-priv/versions ? Is it the result of "the whole sisyphus is currently pessimized"?

mikhailnov commented 3 years ago

Adding YYYY-MM-DD to the version requires manual work.

Here is an example of how git snapshot can be packaged: https://abf.io/import/gimagereader/blob/a83f21be3b/gimagereader.spec

%define commit d3cdd00b3e848867d95db28354afc41814d5dd0c
%define commit_short %(echo %{commit} | head -c 5)
Version:    3.3.1
Release:    2.git%{commit_short}.3
Source0:    https://github.com/manisandro/gImageReader/archive/%{commit}.tar.gz?/gImageReader-%{commit}.tar.gz

Release tag consists of 3 parts. When upgrading to a new git snapshot, the first number is increased, when rebuilding an existing snapshot, the last number is increased.

As a package maintainer, I just go to github or another place, study commits history, then copy the commit hash, change it in the spec file, then run spectool -g *.spec && rm -fv .abf.yml && abf put and that's all, I have neither wish nor time to maintain a correct date of the git commit from which the snapshot was build. I would probably maintain it, but it will not help actually anyhow to neither users nor projects like repology (or am I wrong, will it help?).

I think other maintainers have a similar way of thinking and that is why I would not expect a wide adoption of naming schemes which require additional useless work like tracking date.

AMDmi3 commented 3 years ago

Since versions produced by this versioning scheme are as easy to recognize as versions produced by other versioning schemes

No, they are not. Unlike any other snapshot schemes I've seen, they are completely indistinguishable. There is not a single property which can be reliably used to tell them from official versions.

BTW, how can you explain the following: https://repology.org/project/hasher-priv/versions ? Is it the result of "the whole sisyphus is currently pessimized"?

Yes.

Adding YYYY-MM-DD to the version requires manual work.

I've never required to add YYYY-MM-DD to the version.

Here is an example of how git snapshot can be packaged:

There is no problem with this specific case at all, as a) It's based upon official version b) It clearly distinguishable as a snapshot (by presence of git in Release)

For instance, Repology can (and does) safely treat it as the unmodified version, which won't generate nonexisting release, will be marked newest/outdated correctly and can be compared to NVD with release granularity. The lack of date prevents it from being compared with higher granularity, but we don't to that anyway and I don't think we should and will.

However, it is still pessimized in a way that this version will not be treated as a new if it only comes from an RPM distro. Because the above mentioned "not policies" are widely used, there's no telling that the snapshot is based upon a real release, or a fake "next" release as the not policies suggest. There's no way to tell that by Release starting with 0 either, because these are not policies.

mikhailnov commented 3 years ago

Ah, thanks, I think I understood, so if ALT's version-release was VERSION.0.NUMBER.gitHASH instead of VERSION.0.NUMBER.HASH, it would be recognizable as a git snapshot.

AMDmi3 commented 3 years ago

While the other problems with it remain, yes, at least it would be possible to reliably tell that it's not an upstream version. It won't allow to tell it from snapshots which can be compared to upstream though.

ldv-alt commented 3 years ago

On Fri, Sep 10, 2021 at 04:24:13AM -0700, Dmitry Marakasov wrote:

Since versions produced by this versioning scheme are as easy to recognize as versions produced by other versioning schemes

No, they are not. Unlike any other snapshot schemes I've seen, they are completely indistinguishable. There is not a single property which can be reliably used to tell them from official versions.

These versions are upstream versions followed by .0.distance.digest suffix where distance is a decimal number and digest consists of at least 4 hexadecimal digits, so they are clearly recognizable.

For example:

$ rpmquery elfutils elfutils-0.185.0.54.b561-alt1.x86_64 $ rpmquery --qf '%{version}\n' elfutils \ | sed -E 's/^(.+).0.([[:digit:]]+).([[:xdigit:]]{4,})$/\1\t\2\t\3/' 0.185 54 b561

BTW, how can you explain the following: https://repology.org/project/hasher-priv/versions ? Is it the result of "the whole sisyphus is currently pessimized"?

Yes.

Unfortunately, such a blanket ban approach makes the whole repology.org untrustworthy.

AMDmi3 commented 3 years ago

These versions are upstream versions followed by .0.distance.digest suffix where distance is a decimal number and digest consists of at least 4 hexadecimal digits, so they are clearly recognizable.

No they are not. As can be seen by the link already given above, the probability is quite high for these hexadecimal digits to only consist of decimal digits, making a snapshot indistinguishable from a legal dot-separated numeric version:

1.18.0.27.0405
4.8.0.0.10.1157
2.6.4.0.88.9801
1.0.1.0.8.5087
2.13.0.5.8107
0.12.0.3.4174

Versions like these are used in the wild, in case you wonder.

In some other cases, even if a hash contains [a-f], it's still indistinguishable from legal prerelease or letter-suffixed version:

0.185.0.54.b561
4.06.0.7.100b
4.8.0.7.b352

Unfortunately, such a blanket ban approach makes the whole repology.org untrustworthy.

The very first thing Repology must do to be trustworthy is to prevent garbage from a misbehaving repository to be reported as a new upstream version to all other maintainers, and that we do. As I've already mentioned though, the discussed scheme is not the only and not the main cause for the ban - the amount of random made up versions from Sisyphus is, as the repository is the worst by the number of ignore rules I've had to add and maintain

% grep -R sisyphus repology-rules/900.version-fixes | wc -l
     301

by the number of known incorrect versions

repology=> select repo, count(distinct effname) from packages where versionclass = INCORRECT() group by repo order by count desc limit 10;
       repo       | count 
------------------+-------
 alt_p9           |    90
 alt_p10          |    86
 altsisyphus      |    83
 funtoo_1.4       |    69
 nix_unstable     |    68
 raspbian_testing |    67
 nix_stable       |    67
 gentoo           |    66
 raspbian_stable  |    65
 debian_unstable  |    61
(10 rows)

and by the number of complaints, e.g. cases which actually affect users:

repology=> select count(*) from reports where comment ilike '%sisyphus%';
 count 
-------
    47
(1 row)

So please don't mention untrustworthiness.