kr / hk

Fast Heroku client
https://hk.heroku.com/
77 stars 6 forks source link

hkgen misses generating some patches #70

Closed bgentry closed 11 years ago

bgentry commented 11 years ago

The logic in gen.go in genPatches only generates n^2/2 patches, instead of all n^2, for any given platform+command.

It should be fixed to generate all patches.

Original description follows.


I thought this was a bug with the new S3 path stuff, but it looks like it's a different issue. hkdist gen fetches a list of releases from releases.json, then proceeds to iterate through the list to make sure that patches exist.

However, it does this iteration in the order of the list, not in the order of the release versions. So today was the first time I came up with this edge case:

2013-09-24T23:39:08.952415+00:00 app[gen.1]: gen.go:55: genPatch {darwin-amd64 hk 20130924.1 [23 187 231 111 167 66 168 97 48 58 109 248 252 245 250 212 12 18 131 237 143 190 220 108 15 238 139 182 217 154 64 34]} {darwin-amd64 hk 20130924 [43 49 126 252 49 52 182 150 86 45 21 240 185 215 89 104 214 198 71 220 234 242 59 80 161 23 247 197 8 2 189 250]}

It's generating patches from 20130924.1 to 20130924.

I'm guessing the solution here is that it needs to sort the release list from the API based on the version string.

bgentry commented 11 years ago

Since we have to process the list to sort it, we may also want to group by platform first.

Optionally we could solve this on the hkdist API side too.

kr commented 11 years ago

This is designed to avoid ever having to compare version strings. That is gross and inevitably becomes an ever-growing thicket. I think it's important never to sort or otherwise compare version strings.

So we just have an unordered bag of releases and mark one of them as "current". Choosing which one to mark is left as an external decision, either automatically when we cut a new release or manually to roll back or something.

For patches, hk gen just generates all n^2 patches from every version to every other version. Except it looks like you found a bug, and it's only generating half of them, determined arbitrarily by the order of the list in release.json.

(It used to only generate patches to the "current" release, which could still generate some "backwards" patches if the current version is older than some other release. But it seems simpler to just generate all of them.)

kr commented 11 years ago

I edited the description to reflect the design intent.

kr commented 11 years ago

And by "compare" I meant comparing for order, heh. Comparing for equality is of course fine.

bgentry commented 11 years ago

@kr so one concern about that is that it already takes quite a long time to generate patches for the small number of releases we have (given that there are many platforms). Doubling that work to go in both directions sounds awful, especially when there is no benefit to doing so other than simpler code :(

Ultimately, the build process needs to be sped up considerably from where it's at right now, and preferably have some kind of an upper bound on the number of patches that need to be generated.

Also, the method we're currently using to tell if a patch has been generated (HEAD to S3 URL) is simple, but wasteful. Again, this will grow without bound as we cut more releases.

I think it might be worth considering a more efficient approach, ideally one that would let us parallelize the diff computations and avoid a lot of the unnecessary work we're doing now.

There's some low-hanging fruit that could be reached if we simply compute multiple diffs in different goroutines in the same gen dyno. Maybe that will be enough for now. But we should at least think about other options so that it doesn't take many minutes to make diffs available after a new release.

What if releases include a UTC timestamp for when they were created? While it's not foolproof, I think that would be sufficient to allow for easy sorting & directionality on both the API side and the hkdist gen side.

kr commented 11 years ago

It would be nice to generate fewer diffs, and do it faster, and more in parallel, but all that is just optimization. It doesn't affect how soon the new release is available. Clients will be downloading the full release as soon as it's posted. When the diff is posted they'll switch to downloading the diff. Of course it would be nice if fewer clients had to download a full release, but it doesn't seem like a big deal right now. It'll be easy for us to change that any time in the future, since it won't impact the interface.

bgentry commented 11 years ago

Closing as #80 fixes this.