ipfs / js-ipfs

IPFS implementation in JavaScript
https://js.ipfs.tech
Other
7.43k stars 1.25k forks source link

Migrate from cids to multiformats/cid #3443

Closed Gozala closed 1 year ago

Gozala commented 3 years ago

Recent changes to cids library introduced regressions to js-ipfs which then we had to go around fixing. There are also list of issues we wanted to address that were already addressed in multiformats/cid at this point delaying migration to it and updating cids seems like wasting a lot of time in attempt to buy us some time.

I would like to propose that we embark on this endeavor to stop maintenance of two diverging libraries. Here is roughly how I propose to approach it:

  1. Release cids@2.0.0 that just (re)exports multiformats/cid and also logs deprecation warning telling to migrate to multiformats/cid instead. - Intention here is to eventually deprecate cids and provide a guidance on migration path.
  2. Release cid@1.nextMinor that:
    • Makes new CID(cidv2) turn into CIDv1 - Intention here is to keep existing code semi-compatible with new CIDs. In our code base it is fairly common to do this type casting, so it is reasonable to assume that is likely in other code bases as well.
    • Make CID.isCID(cidv2) return false - Intention here is to trigger code paths that do type casting instead of failing later. If nothing else it will cause existing code to fail earlier than later.
    • Deprecate (via @deprecate annotation) all the things that are gone in new CID. - This will enable us and other users to identify things that need updating using linters and type checkers.
  3. Update this repo to replace all uses of 'cids' with multiformats/cid.
    • It will be a breaking change & I'm not confident that code (e.g. libp2p) that expects old cid's still wont fail despite things placed in cid@1.nextMinor. But hopefully it will reduce amount of coordination required.
  4. It may even make sense to rename things in new CID that exist in older one so that code assuming old CID will break early on.

@achingbrain @hugomrdias @mikeal @vmx what do you think about this proposal ? If this sounds good I think I can drive the effort once I'm done with remote pinning services.

carsonfarmer commented 3 years ago

I like the thoughts in this proposal. I've recently gone through the "pain" of trying to support both CID libs in our codebase, and its really quite tricky. The "rip off the bandaid" approach is probably best, but this appears to cover a lot of the sharper edges. I'm not sure "Make CID.isCID to always return false is a "safe" fix? I get the intention, but I'm trying to think of my own cases where I might end up in some weird state in that case? There are a few "hacks" to deal with some of this in the past in https://github.com/ipfs/js-dag-service/blob/default/src/core/blockservice.ts#L66 which is a bit of a hack.

vmx commented 3 years ago

I have little experience with such a massive breaking change. It sounds good to me, but I'd hope that others with more experience can say more about it.

achingbrain commented 3 years ago

It's important to note that the issues were caused by breaking changes being released as patches. This is a problem as old as the hills and despite our frustration at having to fix this sort of breakage, switching one dep out for another makes no guarantee it won't happen again.

Moving to the comparatively untested multiformats/* modules will likely uncover lots of new and interesting failure modes in that codebase so should not be thought of as a fix for this problem.

That said we cannot use the existing libraries forever as the IPLD team have to waste time merging and releasing changes to a stack they've largely abandoned - that time would be better spent helping us do the upgrade. Our Q1 OKR planning may present an opportunity to begin this migration.

Intention here is to keep existing code semi-compatible with new CIDs. I'm not confident that code (e.g. libp2p) that expects old cids still wont fail

I don't think this is a realistic expectation as the concern over libp2p shows. Multiple versions of cids in the dep tree has broken things badly in the past which is why CID.isCID was introduced in the first place, to paper over the instanceof operator not working when the CID classes are loaded from different modules - arguably this is the bandaid that needs ripping off, it doesn't guarantee compatibility, just that some instance properties are the same. We've talked about CID.asCID returning an instance of a CID at the version you expect, but if you allow arbitrary versions in the dep tree you cannot predict the changes our future selves will make so there's no guarantee it'll be able to do the conversion. Better to just stop trying to make magic and instead be more ruthless in pruning the dep tree of duplicate deps with incompatible versions.

  1. Release cids@2.0.0
  2. Release cid@1.nextMinor
  3. Update this repo to replace all uses of 'cids' with multiformats/cid.

I don't think we can do this sort of top-down approach with the ipfs codebase, there are too many modules that share dependencies and the complexity of trying to manage different versions between them is non-trivial.

We will most likely end up upgrading to the entire new multiformats stack, otherwise we'll have to upgrade the existing IPLD formats to the new CID class/etc which seems like a wasted effort if we want to use the newer IPLD formats in the longer term.

We should instead follow the model we established with things like callbacks -> promises, when ipld-dag-pb was standardised as ipld-format, the async/await migration, Buffers in the browser, Uint8Arrays instead of Buffers and all the other times we've rewritten great big chunks of the codebase.

We divide up the codebase, upgrade modules from the bottom up and resist any and all temptation to make any other 'improvements' along the way.

If this sounds good I think I can drive the effort once I'm done with remote pinning services.

I think this task is larger than one person, or even one team - it will need coordination across IPFS, libp2p and IPLD.

We should create an awesome endeavour-style issue that lists the modules that need upgrading and the order in which it needs to happen, then factor doing the actual work into our Q1 OKR planning.

Gozala commented 3 years ago

@achingbrain Just to clarify I was not proposing we have older CID in some places and new CID in other, I agree that it would be best to update all the libraries and release them in concert. That said, I do think that doing proposed extra steps will enable users of these libraries to have a bit more breathing room in rolling out updates to their own code base as opposed to spin their own coordinated efforts to make this lockstep changes.

More broadly I think we should consider what happens when code expecting new CID receives the old one and other way round, which in practice means e.g. deciding what the return value is for OLDCID.isCID(newCID) and decision should probably be based on the pros & cons.

Doing the extra work trying to reduce the gap between new and old CIDs also means that users have smaller gaps to fill themself.

whizzzkid commented 1 year ago

js-ipfs is being deprecated in favor of Helia. You can follow the migration plan here https://github.com/ipfs/js-ipfs/issues/4336 and read the migration guide.

This issue has been resolved in Helia! if this does not address your concern please let us know by reopening this issue before 2023-06-05!