timvisee / version-compare

:left_right_arrow: Rust library to easily compare version strings. Mirror from https://gitlab.com/timvisee/version-compare
https://crates.io/crates/version-compare
MIT License
43 stars 8 forks source link

Ideas & resources for new crate design #20

Open timvisee opened 5 years ago

timvisee commented 5 years ago

Here I describe some ideas, thoughts, resources and possible design decisions I gathered during a brainstorm session to help developing a new crate design for version-compare in the future. It lists things I like to have, some facts we can use, a simple list of existing version schemes and so on. I hope this helps to shed light on all desired features in a new design, by distilling what is used currently.

Some things may not be implementable, might be incorrect and I'm probably missing a lot. That's something to look into later.

Like to have

Here are some ideas I'd like to have:

Design facts

Here are some design facts/choices that I think will be assumed/used for the redesign of this crate. This could be used to build the core Version and Ver structs upon, allowing support for a large spectrum of version numbers:

Not compatible with everything in ..., but compromises must be made:
https://xenoterracide.com/post/falsehoods-programmers-believe-about-versions/

Existing version schemes

Some version schemes that were mentioned and used right now. Could be used to source mechanics from to validate a new crate design is sufficient, and version schemes could be implemented for it:

Core version components

Component types that might be used globally in the core of the crate:

Special version components

Component types that might be a version scheme specific implementation:

Examples

Some version numbers I'd like to support, in different styles:

And some components as well:

Other resources

Some other useful resources, not really covered yet:

Edit: added suggestions from https://github.com/timvisee/version-compare/issues/20#issuecomment-537302014

msarahan commented 5 years ago

This is a good summary, thank you for putting it together. I'd like to add comments, but the github issue format here feels cumbersome. Do you mind moving it to a Google Doc or similar to facilitate making (suggested) edits? I have started one at https://docs.google.com/document/d/1HlroZ02HcXzyIFGcQcLGs9QkGyQ5945Sr0GCHX7S6SU/edit?usp=sharing and added my comments there. I'm happy to move to any other medium if you prefer, but preferably after we settle my comments in that Google Doc, just to avoid needing to transfer them somehow.

msarahan commented 5 years ago

PS: I'm happy to give you edit rights on that doc, or reassign it to you. I just don't know your email address and don't want to post a link-share edit thing.

timvisee commented 5 years ago

Thanks a lot for the thorough look. I've commented on most of your comments in Docs.


If there's only one struct, how would you differentiate between results that use different schemes?

I usually like to keep things as generic as possible, if it doesn't hurt flexibility too much. Things like this are hard to comprehend, but can be super beneficial in the long run when done properly, when implementing a lot of schemes.

I'll try to explain what 'perfect' (in my opinion) situation I'm currently thinking about (in undefined points and order). I've changed my thoughts on design slightly since my last post, but here it is:

Now for some scheme specific cases:

You get the point, all schemes have a lot of interesting edge cases with scheme-specific special components. Using this implementation, component types would probably define their significance, order and relation against other components.

Currently, in version-compare, when comparing a version number it walks through the components (parts) of both (zip, but smarter) versions at the same time, comparing each other. If one is larger than the other, that version is newer. If components are the same the system continues to the next component in both versions. Comparing to no component (in 1.2 and 1.2.3, comparing the 3 to no component from the other) would always be newer. Comparing a number or no component to a literal would always be newer. I like this idea a lot but it currently isn't extensible (nor doesn't suffice for different schemes) because all comparison logic is implemented in a single function. Attaching this the actual component types itself might solve this. I really hope work with and extend this core idea, because it seems like such a good idea to me. This is also why I'm constantly referring to a generic Version for all schemes by the way.

The question is, whether a new architecture like this would support all version schemes we can currently come up with with the constraints you might set on components. Maybe comparing different component types is enough. Maybe components should have an significance value. Maybe components should return what their relation is to other component types, I don't know.

So, to summarize:
A parsed version would be represented as list of components, consisting of components. The set of components is extendible. These have comparison logic attached to them. The create compares from left to right, component by component.

This idea attempts to heavily generalize all version numbers into something that fits all while assuming the 'Design facts' as mentioned before. It will never be super strict, but could 'just work' in many unthought of situations with weird version string formats.

Now, this rough idea definitely won't translate into Rust right now. I'm not even sure if a system similar to this idea is properly and efficiently implementable (if assuming this design would indeed work for all imaginable version schemes we'd like to support).

I'm currently thinking really hard to figure out whether a system like this could work. I haven't really thought of any blockade yet (a version scheme choice) that would immediately invalidate this generic system and make it useless. But I might just be missing something big here.

Perfectly comparing anything imaginable will always remain an impossible task, and this will always be a best effort approach. Schemes that differ so much from everything else are probably better of with a full custom comparison implementation anyway. Sadly, schemes and given versions are inconsistent.


What are your thoughts on the rough idea I've explained here? Could it hold up for everything we're trying to achieve here? Is it too much?

Would implementing every scheme separately (live you've suggested) be beneficial? Could we prevent lots of duplicate code with such an implementation? Wouldn't it be better for an approach like this to separate every scheme in a different tailored crates anyway as reimplementing a lot of similar stuff is probably required?


Sorry if what I'm saying here is unclear, and sorry for putting down 1.2k words. I always find it really hard to abstract/summarize my ideas in clear understandable terms. I hope the examples I gave help clarifying. Please correct me if I'm saying stupid things :)

I'm fine with discussing this through Docs, my Google address is my GitHub username with @gmail.com attached.

I've updated the first post with the changes you made in the linked document.

Thanks again for your time, thoughts and comments!

msarahan commented 5 years ago

I hadn't thought of the use of custom components (and the creation of those components by the specific scheme) as the way to achieve custom comparison. I like it a lot! I think you're onto a very nice, clean way to do things. Perhaps the usage patterns could be made nicer by currying the Version struct instantiation with a given scheme, such that Conda could hide the need to pass the scheme explicitly each time. Maybe that happens in version-compare, or maybe it happens on the external (conda) side. Probably the latter.

I think the addition of scheme-specific info like epochs to the data structure for all schemes is not great, but it's also not horrible. In my opinion, things like epoch must be compared separately from the other version stuff. I don't like comparing 1!1.0 with 2.0 and say that 1:1.0 is higher on the basis of Epoch(1) trumps Version(2). Rather, Epoch(1) trumps the implicit Epoch(0). The vector comparison for other numbers otherwise holds, and then I think the end pre/post/git tag/whatever release may again need special treatment, but perhaps not, since Number(1) will trump Alpha(x) or Date(x).

Now, this rough idea definitely won't translate into Rust right now.

Why not? What specific parts do you think are missing? Is this at a point where some pseudocode and rough placement of data structures is appropriate?

timvisee commented 5 years ago

... currying ...

Currying is a good idea, though that doesn't work too well in Rust at the moment. This could be in schemes, in the form of MyScheme::parse(ver) -> Version::parse(ver, scheme). Implementing this externally would be a nice alternative, and is up to the end user of course.

I think the addition of scheme-specific info like epochs to the data structure for all schemes is not great, but it's also not horrible.

I thought this would be common enough, and of course, it does not matter if schemes don't use such component.

In my opinion, things like epoch must be compared separately from the other version stuff. I don't like comparing 1!1.0 with 2.0 and say that 1:1.0 is higher on the basis of Epoch(1) trumps Version(2). Rather, Epoch(1) trumps the implicit Epoch(0). The vector comparison for other numbers otherwise holds, and then I think the end pre/post/git tag/whatever release may again need special treatment, but perhaps not, since Number(1) will trump Alpha(x) or Date(x).

I think you're suggesting to do version checking in various passes with a collection of components. So, first a pass for epochs, then a pass for numbers, then a pass for suffixes. Is that right? Thinking of it that way sounds like a good idea.

Now, this rough idea definitely won't translate into Rust right now.

Why not? What specific parts do you think are missing?

For example, for comparison you would normally implement Ord on structs (or components in this case). This works well when using the same type. But if comparison across different component types is needed, just for some combinations, this won't suffice. It's not a good idea to check at runtime whether a component has such trait implemented (I think, if even possible). I don't see a way yet to statically (typed) compile this.

And a list of generic components in Version would probably require a list of Box<Component> (where Component is a trait). Putting it on the heap this way isn't too great for performance either. using an enum (the usual alternative) probably isn't really an option because an enum is not extendable.

Is this at a point where some pseudocode and rough placement of data structures is appropriate?

Probably, yes.

msarahan commented 5 years ago

Currying is a good idea, though that doesn't work too well in Rust at the moment. This could be in schemes, in the form of MyScheme::parse(ver) -> Version::parse(ver, scheme). Implementing this externally would be a nice alternative, and is up to the end user of course.

Agreed that the scheme is the right place to put this. Thanks!

I think you're suggesting to do version checking in various passes with a collection of components. So, first a pass for epochs, then a pass for numbers, then a pass for suffixes. Is that right? Thinking of it that way sounds like a good idea.

I thought about this more last night. I'm really torn on whether this is actually a good idea. Making it multi-pass loses a lot of generality, and I'm not entirely sure it's necessary, as long as the type definitions (including interactions between types) really can cover all comparisons that might happen. Comparing types 1:1 is not strictly necessary, and probably not even helpful aside from perhaps reducing some code necessary for defining comparison relationships between types. I'd like to try to avoid this multiple pass approach, but keep an eye out for reasons why it might be necessary.

Regarding your points about Rust's language features, I'm just not experienced enough to understand much here. You've given me good things to read about and tinker with, though. I'll try to come up with some toy code soon - today if time allows, but perhaps this weekend.