Closed warpfork closed 3 years ago
(The test that's currently failing is for murmur3. Input needed on what we're going to do with that thing.)
is this ready for a round of review? are there any loose things that still need to happen?
Okay! This is finally green.
I believe all comments are addressed.
Also note that I updated a bunch of the transitive dependencies. It was forced (see https://github.com/multiformats/go-multihash/pull/136/commits/877240bb0cbf040292354310a86c5a876c26fe6d / https://travis-ci.com/github/multiformats/go-multihash/builds/219591152 ). But none of our fixtures reported effects, so I presume this to be safe and joyously uninteresting.
Last call for any more reviews?
I've attempted to push this downstream through go-cid, and here's what that might look like: https://github.com/ipfs/go-cid/commit/26b387af822073023f9c026a4ea68a6349abf20e#diff-88c2c4b0892fb0172322544e3937e937bb18beccecb1b24a1b5028758a43685bR346-R375
There's some questions worth a quick review there. Namely: it's entirely possible to make all the code work without a need for DefaultLengths
, but there are some tests which conflict with it. It's not clear to me if those tests are testing something important and relied upon or if they happen to assert what they do incidentally.
Everything else looks fine -- although we do again end up seeing the go-multihash/register/all
import now needing to appear in test files, which should probably not be a surprise.
There's some questions worth a quick review there. Namely: it's entirely possible to make all the code work without a need for DefaultLengths, but there are some tests which conflict with it. It's not clear to me if those tests are testing something important and relied upon or if they happen to assert what they do incidentally.
We do need DefaultLengths, we use it in quite a few places. You can find these places by:
Let's make this PR minimally breaking so we can merge it ASAP and move on.
DefaultLengths
is back in.
Here's what propagating through go-cid looks like now, with all tests passing: https://github.com/ipfs/go-cid/pull/118
I guess the final check here is: do we want to keep github.com/multiformats/go-multihash
, the root package of this repo, still dragging in all transitive dependencies, to really really minimize change? E.g. effectively have it import register/all
and register/miniosha256
?
https://github.com/multiformats/go-multihash/issues/131 did originally describe doing the registration rework in a new package, and keeping the go-multihash root unchanged.
As it stands now, the go-multihash symbols exported are pretty much unchanged, but the code does go all-in on making the dependencies optional, so the behavior does shift. If we're scared of this and the consequent need to add the registration imports in downstream applications, and we'd rather make the dependency minimization an opt-in action that has to be done separately from just updating the library by putting it in a deeper package, we can do that.
... heck, I'll just draft that now. If we introduce a new package, who wants to name it? go-multihash/core
? or go-multihash/lite
? or just put it in go-mulithash/register
?
Yeah, let's minimize breakage This library is a bit popular: https://pkg.go.dev/github.com/multiformats/go-multihash?tab=importedby. A "core" package would work (I'd call it that).
I guess the question is: what do you need. This appears to be blocking other work, so I want to make sure we merge that need ASAP.
If you don't really need to reduce dependencies (I'm pretty sure you don't, at least right now) then I'd:
Basically, we need unblock this ASAP and we need to avoid causing a meltdown. Let's just go with the easiest approach and introduce a 2.0 (we'll get there eventually regardless).
Alright. go-multihash/core
is now a package, it does the thing. go-mulithash
still brings on all the deps unconditionally.
I think I've run out of changes to avoid. go-cid
can now be updated to this last commit with literally no code changes.
I'll also bow out on version tagging. Whatever numbers you like, fine by me. (I'm wary of major numbers and the Different Behavior they prompt from go mod
, but... what could go wrong, really)
I've tested with go-ipfs and the only broken test is the one that tries to use murmur (to make sure we don't allow it).
@warpfork could you cut a minor release when ready?
This is now available within the v0.0.15
tag.
This diff revisits the registry system to use more standard interfaces for hashers, and puts it to work to reduce the transitive dependencies of go-multihash.
Dependencies:
Previously, depending on the go-mulithash package would create direct dependencies to several other modules for their various hash function implementations. This meant that instead of go-multihash being a lightweight, easy-to-accept dependency itself, it became something which would noticably increase the size of your go.mod file, your package graph, your download sizes during development, and most concerningly, your compile output size in final products.
Now, there is a registry system (see the
Register
function), and the main go-multihash package only populates the registry with hashes that are available from the golang standard library by default. This means you gain no transitive dependencies on other libraries by importing the go-multihash package, and your binaries will not be bloated by hashers you don't use. (Your go.mod file may still show more repos; but they don't end up in your builds unless you actually refer to them).There are now several new packages in
go-multihash/register/*
. These can be imported to register the hashes in those packages. If you want all the hashes that were previously available, just make sure to import "go-mulithash/register/all" somewhere in your program. (You can register hashes, too, without making PRs to this library; these packages are just here for convenience and easy use.)This is a breaking change if you used hashes not found in the golang stdlib, such as blake2 and sha3. However, to update, all you need to do is ensure the relevant
go-multihash/register/*
package is imported anywhere in your program -- an easy change. Ago-multihash/register/all
package can be imported to get a hasher registered for all of the same multihash codes as before (but will correspondingly add the dependency weight back too, of course).Standard hash.Hash:
Previously, go-multihash had its own definition of a
HashFunc
interface, and only exposed hashing through themultihash.Sum
method. The problem with this was these definitions did not support streaming: one had to have an entire chunk of memory loaded at once, in a single contiguous byte slice, in order to hash it.(A second, admittedly much more minor, problem with this was that one often had to write glue code to turn a
hash.Hash
into amultihash.HashFunc
, and since most of golang uses the standard libhash.Hash
definition already, this was generally avoidable friction.)Now, the Register function operates in terms of standard
hash.Hash
, and there is aGetHasher
function which can get you ahash.Hash
. (Okay, to be more precise, these functions take and return a factory function for ahash.Hash
. You get the idea.)Since the standard
hash.Hash
interface can operate streamingly, now it's easy to use go-multihash in a streaming way.The
multihash.Sum
method works the same as always.Be a data carrier:
Previously, go-multihash contained checks that any multihash indicator codes being handled were required to have a hash function registered for them. This made it very difficult to use go-multihash in a "forward compatible" way (and it also made a lot of practical bumps for this dependency-extraction refactor).
Now, go-multihash is willing to carry data, even if it doesn't know what kind of hash function would be associated with an indicator code. (Methods that you'd expect to parse things do still parse the varints, making sure they're sanely formatted. They just don't inspect and whitelist the actual integers anymore.)
I removed the
ValidCode
predicate entirely. It doesn't seem to serve any good purpose anymore.Other:
I have not touched the
Codes
andNames
maps in this diff. I think we should probably review (and probably remove) these, and instead direct people to use the go-multicodec package instead, which has the two advantages of decoupling registration of an implementation versus simply having a description, and also being automatically generated from the multiformats table. However, I wanted to check on feelings about this before doing the work (especially because they're somewhat entangled with a bunch of the tests in this package, making their removal somewhat nontrivial).Most of the test files are now
package multihash_test
. This makes for some colorful diffs, but is not otherwise interesting. The reason for this is because the dependency separation process now requires the tests to import thoseregister/*
packages, and to avoid a cycle, that means, well,package mulithash_test
.I think there's probably more work to be done in making this library really shine. For example, in reviewing the
Encode
function, I see some allocations that look very likely to be avoidable... if the function was redesigned to be more aware of how it's likely to be used. However, I took no action on this, in part because this diff is big enough already, and in part because I think it might be reasonable to re-examine the relationship of this code to go-cid at the same time.I dropped
TestSmallerLengthHashID
. It appeared to be testing an API that wasn't actually exported... and the nearest API that is exported (Sum
) has a general contract of truncating a hash upon short length, so it was overall unclear what this test should be checking. Review might be needed on this.The situation for murmur3 is still in need of resolution. It's commented out entirely for now. Questions are noted in the diff.
There's a 'register/miniosha256' package which sets the sha2-256 implementation to a non-stdlib one. If you don't import this package, you still get a sha2-256; it's just the stdlib one. I did not include this in the 'register/all' group. (Maybe it's faster; maybe it's not; but it's definitely not required, and I'm getting some reports it also shows weird on profiles, so I tend to think maybe one should really have to explicitly ask for this one.)
There are a couple other review questions marked with "// REVIEW" in the diff.