ipfs / go-cid

Content ID v1 implemented in go
MIT License
156 stars 47 forks source link

CodecToStr and Codecs in v0.3.1 could cause data corruption #144

Closed lidel closed 2 years ago

lidel commented 2 years ago

v0.2.0 we made a conscious decision to make a breaking change to FORCE people to refactor their code, and not be surprised by code changes. Rationale in https://github.com/ipfs/go-cid/pull/137.

In golang major bump is essentially a different package – bumping major does not help people who are already using INVALID mappings. That is why we did it in v0.2.

Recently merged PR https://github.com/ipfs/go-cid/pull/142 removed that safety and could cause unexpected code change if someone updates from v1.0 to v3.0. IIUC we now have this:

Upgrading v0.1.x to v0.3.1 can now produce DAGs with different codec and get NO warning about this BREAKING CHANGE.

@Jorropo @rvagg If you reallly want to keep CodecToStr and Codecs, you should make sure they return hard error when someone tries to use them for impacted mappings above. Third-party apps use this library. People don't read release notes. Silent data corruption is not acceptable.

Jorropo commented 2 years ago

Ah hum ... mb Ill revert, open pr for 0.3.2 and fix Kubo's dep

rvagg commented 2 years ago

Well, as I said in #142, the main break is experienced via CodecToStr, I don't really care that much about Codecs as I haven't seen any difficulty with that. The main source of difficulty is in the way go-libp2p performed some major upgrades at the same time as adapting to this change. So to pull an updated go-cid, a package that's also using go-libp2p has to go through the full upgrade there as well just to get a latest version. Because go-cid is so core to multiformats and ipld, you can't get an updated go-ipld-prime or related packages without going through all of these hoops.

So, in hindsight, I still think we made the wrong call in #137 and should have gone through a longer deprecation cycle, using the deprecation tools afforded by Go—which aren't awesome, but they exist.

More discussion in #142 would have been nice to work through this a bit further.

lidel commented 2 years ago

Just to be clear (and reiterate #137), both Codecs AND CodecToStr had invalid entries.

In my mind, this is not something we can "silently fix" like you did in #142. Silently changing mappings in #142 introduced silent breaking changes in disguise of "deprecation".

My position is:

Digression about work in Q1 and potential shortcuts:

I am deeply aware how core go-cid is, and how painful bubbling this up is :crying_cat_face: Kubo team went over this in IPFS in Q1: had to open multiple surgical PRs to fix this mess in Kubo 0.13.0 and go-libp2p, and change many commands in a way that does not silently break existing users (see summary of BREAKING CHANGES in release notes).

Note the strategy around https://github.com/ipfs/go-cid/pull/137 was to replace use of Codecs and CodecToStr in places that are not impacted by invalid mappings WITHOUT bumping go-cid (e.g., https://github.com/libp2p/go-libp2p-core/pull/240). The actual go.mod bump happened in go-libp2p 0.21, two months later.

Initially, we only bumped go-cid to v0.2.0 in Kubo to force refactor of impacted code paths and redesign of end user commands. Sure, it is a lot of tedious work, but the end result in Kubo 0.13.0 avoided exactly the type of silent data corruption described in my comment at the top (and briefly re-introduced in v0.3.1).

This was done months ago. Since then, unrelated refactors of go-libp2p landed (such as repo consolidation), and now you have even more technical debt to pay off.

You could re-introduce Codecs AND CodecToStr and then manually censor tables and remove entries with invalid mappings. This will ensure go-cid won't produce invalid data, and save you from bubbling up refactors, but will make also make some apps error randomly on cbor/protobuf/dag-pb.

I did not go that route because it felt like dealing with the fallout will be more work than doing things properly and refactoring away from go-cid.