mtkennerly / dunamai

Dynamic versioning library and CLI
https://dunamai.readthedocs.io/en/latest
MIT License
312 stars 24 forks source link

Added a more aggressive sorting mechanism for git tags #9

Closed mariusvniekerk closed 3 years ago

mariusvniekerk commented 3 years ago

This should ensure that tags that are more recent topologically will always take precedence. Additionally we prefer using the timestamp from the tag (if annotated), which ensures that annotated tags created more recently for the same commit are selected.

This should prevent edge cases arising from multiple timezones being used to commit to the same git repository.

mtkennerly commented 3 years ago

Thanks for this! I like the fix for the v0.2.0b1 vs v0.2.0 matching, and the time zone comment sounds promising, but I'd like to make sure I understand exactly how this improvement works and maybe if we can simplify some of it.

Dates

In my testing, the creatordate and taggerdate were equivalent for annotated tags, and taggerdate is undefined for lightweight tags, so I'm not sure why you added a check for taggerdate. Is there some case where it helps?

$ git init
$ echo foo > foo.txt
$ git add .
$ git commit -m "init"
$ git tag lightweight
$ git tag annotated -m hi
$ git for-each-ref "refs/tags/*" --merged HEAD --format "%(refname)  |  %(creatordate:iso-strict)  |  %(*committerdate:iso-strict)  |  %(taggerdate:iso-strict)"
refs/tags/annotated  |  2020-12-08T21:27:20-05:00  |  2020-12-08T21:26:55-05:00 |  2020-12-08T21:27:20-05:00
refs/tags/lightweight  |  2020-12-08T21:26:55-05:00  |    |

Time zones

I thought the original implementation would have also handled different time zones because of the %z in _parse_git_timestamp_iso_strict. Where/how were time zones being handled incorrectly before?

Test with v0.2.0b1

I verified that adding v0.2.0b1 made test__version__from_git__with_annotated_tags fail with the original implementation, but I can't tell the exact difference in your changes that fixes it. Could you help explain the key differences that solve it? I thought it might be related to --topo-order, but I don't currently understand why the existing implementation's timestamp check was insufficient.

mariusvniekerk commented 3 years ago

So there are a two parts going on here

dates

The new priority of dates results in

  1. @{%(taggerdate:iso-strict) explicitly returns the creation time for an annotated tag, empty otherwise.
  2. %(*committerdate:iso-strict) dereferences to the underlying commit so you get the timestamp that the commit was created not the tag's date.
    • i've kept this behavior, but it is probably not what we actually want._
    • odds are we can just remove that.
  3. @{%(creatordate:iso-strict) returns the creation time for an annotated tag, and the creation time for a commit for a lightweight tag.

topo oder

The --topo-order thing is separate from the timestamps and useful for preventing you from detecting tags that have been created in the past

Say we have the following commits

c1----c2-----c3----c4
       |            |
       v1.0.0       HEAD

We'd detect HEAD correctly as 1.0.0 with a distance of 2

if c1 now gets tagged as v0.9.9 and we relied purely on timestamps we would now incorrectly detect the version as 0.9.9 with a distance of 3. The --topo-order changes gets us to behaving in the same way that git --describe does.

The topology changes means we'd always prefer tags closer to the current HEAD, and the timestamp logic is used mostly to resolve ties where multiple tags exist on a single commit.

mtkennerly commented 3 years ago

Ah, I think you're using --topo-order how I meant to use *committerdate, but I just realized where I went wrong. I sorted by either *committerdate or creatordate depending on which was set, whereas your approach works because you're essentially combining two sort keys.

This minimal change also fixes the v0.2.0b1 test:

diff --git a/dunamai/__init__.py b/dunamai/__init__.py
index 0cd245f..714fff3 100644
--- a/dunamai/__init__.py
+++ b/dunamai/__init__.py
@@ -393,16 +393,17 @@ class Version:
         detailed_tags = []
         for line in msg.strip().splitlines():
             parts = line.split("@{")
+            creator_date = _parse_git_timestamp_iso_strict(parts[1])
             detailed_tags.append(
                 (
                     parts[0].replace("refs/tags/", "", 1),
-                    _parse_git_timestamp_iso_strict(parts[1]),
-                    None if parts[2] == "" else _parse_git_timestamp_iso_strict(parts[2]),
+                    creator_date,
+                    creator_date if parts[2] == "" else _parse_git_timestamp_iso_strict(parts[2]),
                 )
             )
         tags = [
             t[0]
-            for t in reversed(sorted(detailed_tags, key=lambda x: x[1] if x[2] is None else x[2]))
+            for t in reversed(sorted(detailed_tags, key=lambda x: (x[2], x[1])))
         ]
         tag, base, stage, unmatched, tagged_metadata = _match_version_pattern(
             pattern, tags, latest_tag

Can you think of any cases where the above would behave differently than your proposed changes?

mariusvniekerk commented 3 years ago

Since you can specify and amend both committer and author dates for git commits it could lead to some strange cases if you just ignore the graph structure.

mariusvniekerk commented 3 years ago

Additionally any repo that makes use of git-am or other similar patch based workflow would not have commits in chronological order.

Rebases can also easily do this.

mtkennerly commented 3 years ago

Got it; that's a great point. Thanks for the explanation :+1: