Open timvisee opened 5 years ago
This is a good summary, thank you for putting it together. I'd like to add comments, but the github issue format here feels cumbersome. Do you mind moving it to a Google Doc or similar to facilitate making (suggested) edits? I have started one at https://docs.google.com/document/d/1HlroZ02HcXzyIFGcQcLGs9QkGyQ5945Sr0GCHX7S6SU/edit?usp=sharing and added my comments there. I'm happy to move to any other medium if you prefer, but preferably after we settle my comments in that Google Doc, just to avoid needing to transfer them somehow.
PS: I'm happy to give you edit rights on that doc, or reassign it to you. I just don't know your email address and don't want to post a link-share edit thing.
Thanks a lot for the thorough look. I've commented on most of your comments in Docs.
If there's only one struct, how would you differentiate between results that use different schemes?
I usually like to keep things as generic as possible, if it doesn't hurt flexibility too much. Things like this are hard to comprehend, but can be super beneficial in the long run when done properly, when implementing a lot of schemes.
I'll try to explain what 'perfect' (in my opinion) situation I'm currently thinking about (in undefined points and order). I've changed my thoughts on design slightly since my last post, but here it is:
Version
struct. Use a default scheme, but allow selecting a custom scheme such as one for PEP440/Conda.Version
structs would probably remember what scheme it uses, but maybe that isn't even important.Version
struct would contain a list all parsed version components.Number
and Literal
, maybe Epoch
and a few others as well even if unused in some version schemes.Epoch(1) > Number(2)
. Not sure if this is useful though. Maybe something similar can be achieved by reordering components while parsing.Version
s having a different schematic, having components that are not compatible (comparability wise), the crate would probably just return an 'unknown'/'undefined'/'not compatible' result.Now for some scheme specific cases:
dev
and post
suffixes for Conda will probably get their own specific component, as their comparison logic differs from a regular literal (being alphabetical). Comparison logic between the special Conda literal and the regular literal could be implemented (so across different component types in this case), to always favor the special type as newer (not sure what the Conda version spec specifies at this moment).LeadingZeros(num)
. This component would implement comparison logic the way GNU specifies for numbers.Epoch
component even though their separation token is different. (I'm assuming here they work in a similar way).VersionManifest
, if a runtime configurable scheme is ever desired, I think this could be implemented as well. Imagine that your configurable scheme allows setting how literals are ordered. A new component could be introduced for these literals, having their ordering setting attached, or having a reference to the configured scheme that will in the end be used by attached comparison logic. All literals from a version string would then be parsed as this special literal component.You get the point, all schemes have a lot of interesting edge cases with scheme-specific special components. Using this implementation, component types would probably define their significance, order and relation against other components.
Currently, in version-compare
, when comparing a version number it walks through the components (parts) of both (zip, but smarter) versions at the same time, comparing each other. If one is larger than the other, that version is newer. If components are the same the system continues to the next component in both versions. Comparing to no component (in 1.2
and 1.2.3
, comparing the 3
to no component from the other) would always be newer. Comparing a number or no component to a literal would always be newer. I like this idea a lot but it currently isn't extensible (nor doesn't suffice for different schemes) because all comparison logic is implemented in a single function. Attaching this the actual component types itself might solve this. I really hope work with and extend this core idea, because it seems like such a good idea to me. This is also why I'm constantly referring to a generic Version
for all schemes by the way.
The question is, whether a new architecture like this would support all version schemes we can currently come up with with the constraints you might set on components. Maybe comparing different component types is enough. Maybe components should have an significance value. Maybe components should return what their relation is to other component types, I don't know.
So, to summarize:
A parsed version would be represented as list of components, consisting of components. The set of components is extendible. These have comparison logic attached to them. The create compares from left to right, component by component.
This idea attempts to heavily generalize all version numbers into something that fits all while assuming the 'Design facts' as mentioned before. It will never be super strict, but could 'just work' in many unthought of situations with weird version string formats.
Now, this rough idea definitely won't translate into Rust right now. I'm not even sure if a system similar to this idea is properly and efficiently implementable (if assuming this design would indeed work for all imaginable version schemes we'd like to support).
I'm currently thinking really hard to figure out whether a system like this could work. I haven't really thought of any blockade yet (a version scheme choice) that would immediately invalidate this generic system and make it useless. But I might just be missing something big here.
Perfectly comparing anything imaginable will always remain an impossible task, and this will always be a best effort approach. Schemes that differ so much from everything else are probably better of with a full custom comparison implementation anyway. Sadly, schemes and given versions are inconsistent.
What are your thoughts on the rough idea I've explained here? Could it hold up for everything we're trying to achieve here? Is it too much?
Would implementing every scheme separately (live you've suggested) be beneficial? Could we prevent lots of duplicate code with such an implementation? Wouldn't it be better for an approach like this to separate every scheme in a different tailored crates anyway as reimplementing a lot of similar stuff is probably required?
Sorry if what I'm saying here is unclear, and sorry for putting down 1.2k words. I always find it really hard to abstract/summarize my ideas in clear understandable terms. I hope the examples I gave help clarifying. Please correct me if I'm saying stupid things :)
I'm fine with discussing this through Docs, my Google address is my GitHub username with @gmail.com
attached.
I've updated the first post with the changes you made in the linked document.
Thanks again for your time, thoughts and comments!
I hadn't thought of the use of custom components (and the creation of those components by the specific scheme) as the way to achieve custom comparison. I like it a lot! I think you're onto a very nice, clean way to do things. Perhaps the usage patterns could be made nicer by currying the Version struct instantiation with a given scheme, such that Conda could hide the need to pass the scheme explicitly each time. Maybe that happens in version-compare, or maybe it happens on the external (conda) side. Probably the latter.
I think the addition of scheme-specific info like epochs to the data structure for all schemes is not great, but it's also not horrible. In my opinion, things like epoch must be compared separately from the other version stuff. I don't like comparing 1!1.0 with 2.0 and say that 1:1.0 is higher on the basis of Epoch(1) trumps Version(2). Rather, Epoch(1) trumps the implicit Epoch(0). The vector comparison for other numbers otherwise holds, and then I think the end pre/post/git tag/whatever release may again need special treatment, but perhaps not, since Number(1) will trump Alpha(x) or Date(x).
Now, this rough idea definitely won't translate into Rust right now.
Why not? What specific parts do you think are missing? Is this at a point where some pseudocode and rough placement of data structures is appropriate?
... currying ...
Currying is a good idea, though that doesn't work too well in Rust at the moment. This could be in schemes, in the form of MyScheme::parse(ver) -> Version::parse(ver, scheme)
. Implementing this externally would be a nice alternative, and is up to the end user of course.
I think the addition of scheme-specific info like epochs to the data structure for all schemes is not great, but it's also not horrible.
I thought this would be common enough, and of course, it does not matter if schemes don't use such component.
In my opinion, things like epoch must be compared separately from the other version stuff. I don't like comparing 1!1.0 with 2.0 and say that 1:1.0 is higher on the basis of Epoch(1) trumps Version(2). Rather, Epoch(1) trumps the implicit Epoch(0). The vector comparison for other numbers otherwise holds, and then I think the end pre/post/git tag/whatever release may again need special treatment, but perhaps not, since Number(1) will trump Alpha(x) or Date(x).
I think you're suggesting to do version checking in various passes with a collection of components. So, first a pass for epochs, then a pass for numbers, then a pass for suffixes. Is that right? Thinking of it that way sounds like a good idea.
Now, this rough idea definitely won't translate into Rust right now.
Why not? What specific parts do you think are missing?
For example, for comparison you would normally implement Ord
on structs (or components in this case). This works well when using the same type. But if comparison across different component types is needed, just for some combinations, this won't suffice. It's not a good idea to check at runtime whether a component has such trait implemented (I think, if even possible). I don't see a way yet to statically (typed) compile this.
And a list of generic components in Version
would probably require a list of Box<Component>
(where Component
is a trait). Putting it on the heap this way isn't too great for performance either. using an enum
(the usual alternative) probably isn't really an option because an enum is not extendable.
Is this at a point where some pseudocode and rough placement of data structures is appropriate?
Probably, yes.
Currying is a good idea, though that doesn't work too well in Rust at the moment. This could be in schemes, in the form of MyScheme::parse(ver) -> Version::parse(ver, scheme). Implementing this externally would be a nice alternative, and is up to the end user of course.
Agreed that the scheme is the right place to put this. Thanks!
I think you're suggesting to do version checking in various passes with a collection of components. So, first a pass for epochs, then a pass for numbers, then a pass for suffixes. Is that right? Thinking of it that way sounds like a good idea.
I thought about this more last night. I'm really torn on whether this is actually a good idea. Making it multi-pass loses a lot of generality, and I'm not entirely sure it's necessary, as long as the type definitions (including interactions between types) really can cover all comparisons that might happen. Comparing types 1:1 is not strictly necessary, and probably not even helpful aside from perhaps reducing some code necessary for defining comparison relationships between types. I'd like to try to avoid this multiple pass approach, but keep an eye out for reasons why it might be necessary.
Regarding your points about Rust's language features, I'm just not experienced enough to understand much here. You've given me good things to read about and tinker with, though. I'll try to come up with some toy code soon - today if time allows, but perhaps this weekend.
Here I describe some ideas, thoughts, resources and possible design decisions I gathered during a brainstorm session to help developing a new crate design for
version-compare
in the future. It lists things I like to have, some facts we can use, a simple list of existing version schemes and so on. I hope this helps to shed light on all desired features in a new design, by distilling what is used currently.Some things may not be implementable, might be incorrect and I'm probably missing a lot. That's something to look into later.
Like to have
Here are some ideas I'd like to have:
Version
&Ver
to represent comparable version numbersLike
String
andstr
, owned and referencedVersion
andVer
instancesImplement comparison traits, and useful helper functions, similar easy-to-use API to
version-compare 0.0.10
Version
&Ver
to compare on, do not use different structs for all kinds of different schemesAllow users to externally define their own version schemes, to be a framework to compare different version styles as well. Good implementations could be merged into this crate later on. Also allow minimal configurability on some existing schemes, such as the generic/default scheme used for comparison to tweak sorting.
Could speed things up when comparing large version strings
Allow checking whether a version string is valid given a version scheme.
1! > 2
:Epoch(1) > Major(2)
Testing operators such as
>
,>=
,!
,*
,^
,~
>=1.2
,^1
,~1.2.3
,1.2.*
Design facts
Here are some design facts/choices that I think will be assumed/used for the redesign of this crate. This could be used to build the core
Version
andVer
structs upon, allowing support for a large spectrum of version numbers:Not compatible with everything in ..., but compromises must be made:
https://xenoterracide.com/post/falsehoods-programmers-believe-about-versions/
Existing version schemes
Some version schemes that were mentioned and used right now. Could be used to source mechanics from to validate a new crate design is sufficient, and version schemes could be implemented for it:
1:
epochs-debian-revision
: suffix5:5.0.3-4ubuntu0.1
1:
epochs1!
: EpochsCore version components
Component types that might be used globally in the core of the crate:
dev
: always lower than other releasespost
: always higher than otherwise similar releasesSpecial version components
Component types that might be a version scheme specific implementation:
1!
: conda1:
: RPMExamples
Some version numbers I'd like to support, in different styles:
1
1.2.3
1.2.alpha.3
2.0.0-rc.2
: semver1:1.2.3-debian2
: Debian1!1.2.3-post
: PEP440/conda1.0a0
: conda: the a0 is separate from the 1 and 0 parts. Pre-release indicator.1.0.2b12
: used by Applev1.2.3
: common in tags19999.00071
YYYY-mm-dd
: datesYYYYmmdd
: dates0.8.1+git.20100810t184654.ab580f4-0ubuntu2
: dpkg package for git on UbuntuMyApp 3.2.0 / build 0932
1.0.2d
: openssl, which has releases with & without the letter, but will gain sanity soon.And some components as well:
1!
: epoch in Conda1:
: epoch in RPM~suffix
:~
is lowest when sorting in Debian.suffix
-suffix
1a
: letters in components-1.2.3
: negative components1.-2.3
: negative componentsOther resources
Some other useful resources, not really covered yet:
Edit: added suggestions from https://github.com/timvisee/version-compare/issues/20#issuecomment-537302014