multiformats / js-multiformats

Multiformats interface (multihash, multicodec, multibase and CID)
Other
235 stars 54 forks source link

Create a CID from a string or Uint8Array or CID-like object #92

Open achingbrain opened 3 years ago

achingbrain commented 3 years ago

This module has a number of ways of converting things into CID instances - CID.parse for strings, CID.decode for Uint8Arrays, CID.asCID for things CID-shaped.

.parse and .decode will throw, .asCID will not. We generally want to validate what we've been passed as a CID, so a common pattern seems to be forming as:

function toCID (hash) {
  if (typeof hash === 'string') {
    return CID.parse(hash)
  } else if (hash instanceof Uint8Array) {
    return CID.decode(hash)
  }

  const cid = CID.asCID(hash)

  if (!cid) {
    throw new Error('Invalid CID')
  }

  return cid
}

If this seems familiar it may be because this is quite similar to how the constructor from the js-cids class behaves.

It is quite useful though, any objections to making this a utility method on the CID class?

rvagg commented 3 years ago

I'm going to call on @Gozala for this one as he has the strongest opinions about these particular interfaces. asCID is the evolution of isCID that's safer. I don't see why toCID being a more generic thing that throws wouldn't break the pattern, except that it's yet more API surface area. But this is a reasonable pattern for external interfaces.

Gozala commented 3 years ago

I think that jQuery style APIs that take anything leads to poorly designed interfaces and leaky abstractions. Everything ends up a parser trying to infer what the input is. Furthermore it hides some important details that I think are worth emphasizing instead (e.g. CID.parse can only parse base32 and base58btc encoded input for any other encoding you need to pass a codec). In contrast not having such a generic API forces user to think about what is the right thing to pass / hold-on to.

That is why I omitted CID.from(input:string|Uint8Array|CIDLikeObject) which was there in one of the revisions. I still think that places where we feel need for CID.toCID are the places that need to be simplified by restricting the input to a single type.

If we really feel that having such a strict interface would be too restrictive, we can facade js-ipfs or some very high level library to do the conversion and then keep things aligned internally across the board. I would also argue that js-ipfs should not try to be a high level library, but rather lean towards delivering it’s core value via simple API.

I feel like having CID.toCID would make it too easy not to do the right thing (carefully consider a right interface for the component) and just accepting various things in a hope that toCID does the right thing.

Gozala commented 3 years ago

I think not having CID.toCID pushes to a more disciplined approach:

This forces component in each category to do the conversation (or fail ASAP) and leaves core interface free of possible failures due to invalid input.

vmx commented 3 years ago

I'm with @Gozala here. JavaScript is moving more and more into a typed world due to TypeScript. I think our core libraries should be type system friendly. To me the whole move to a new CID library was to have a clear break from "take whatever input and figure it out" interfaces.

I can see that not having it makes moving existing code that is using cids to js-multiformats painful. Though I think that's the problem of the libraries. On the top level, you might have such a "catch-all" method, but once converted to a CID, all lower levels should just take a CID and not really bother about string representations. As this is mostly wishful thinking, I suggest having it implemented it implemented internally in libraries (e.g. in js-ipfs), but not having it here in js-multiformats.

rvagg commented 3 years ago

On balance I'm on that side too. Although I do worry a lot about what TS is doing to our dependency trees in general, we're now just in a position of having to have exactly the right version of js-multiformats in all our code just to get the right CID. @Gozala has been toying with a simple byte interface for CID so that it can just be a plain Uint8Array with a nice view around it when you need it, that would be a solution to this problem of dependencies but it seems to me that the need to do this exposes one of the major problems with layering TS over all our libs - our dependency versions have become so strict that we have no flexibility.

Currently CID is at the heart of IPFS (and IPLD obviously), it's the thing that everything needs. So decoupling that would be nice. But I see a future when we have a much more holistic IPLD stack that is at the heart of IPFS and covers a broad range of functionality around the Data Model that is essential for most of what we do (ipld-prime style), so this problem doesn't go away if we make CID something you don't need a library just to reference.

Gozala commented 3 years ago

On balance I'm on that side too. Although I do worry a lot about what TS is doing to our dependency trees in general, we're now just in a position of having to have exactly the right version of js-multiformats in all our code just to get the right CID. @Gozala has been toying with a simple byte interface for CID so that it can just be a plain Uint8Array with a nice view around it when you need it, that would be a solution to this problem of dependencies but it seems to me that the need to do this exposes one of the major problems with layering TS over all our libs - our dependency versions have become so strict that we have no flexibility.

I would claim that it is not the problem with TS, but with the way we are misusing it. The whole point of types and interfaces is so that you can define the API of a thing and pass it around. We do however tie those to specific classes (which is a mistake), it defeats the whole point of sepacating interface from the implementation.

We could use TS to a great effect to reduce amount of dependencies, but that requires changing our coding style as opposed to fitting TS into existing style.

Gozala commented 3 years ago

Currently CID is at the heart of IPFS (and IPLD obviously), it's the thing that everything needs. So decoupling that would be nice. But I see a future when we have a much more holistic IPLD stack that is at the heart of IPFS and covers a broad range of functionality around the Data Model that is essential for most of what we do (ipld-prime style), so this problem doesn't go away if we make CID something you don't need a library just to reference.

While I was commenting on a specific instance, it is a more general as in lets build composable components that compose well as that can reduce complexity as opposed to plug-able parsers that grow in complexity as you put more of them together.