ipld / go-ipld-prime

Golang interfaces for the IPLD Data Model, with core Codecs included, IPLD Schemas support, and some handy functional transforms tools.
MIT License
132 stars 50 forks source link

datamodel Length() not usable with lazy nodes (e.g. ADLs) #531

Open aschmahmann opened 1 year ago

aschmahmann commented 1 year ago

Noticed while discovering https://github.com/ipfs/go-unixfsnode/issues/54

The contract on node.Length() is https://github.com/ipld/go-ipld-prime/blob/65bfa53512f2328d19273e471ce4fd6d964055a2/datamodel/node.go#L131-L133

This means that if there is any sort of lazy loading or computation involved in working with a node that is definitely a list or map that there is nothing valid to return.

  1. Could return -1 but that's a contract violation
  2. Could return some number (e.g. go-unixfsnode's HAMT returns the largest number of entries it can confirm https://github.com/ipfs/go-unixfsnode/blob/364a5496925b291900d67c35b8b898e1a30b7f0a/hamt/shardeddir.go#L262-L289), but that's also a contract violation since the user expects "the number of entries in a map" not "some number <= the number of entries in a map" and the application has no indication that anything has gone wrong.

I suspect the options here are:

  1. Change Length to return an error
  2. Change the contract to allow -1 to mean any type of error
  3. Add a new function (name TBD) that does return an error and deprecate Length

While 2 seems the easiest, it has the ability to break a bunch of code without people noticing. Simple compile-time checked fixes like 1 seem a safer option.

Whether 1 or 3 is preferred seems like a matter of taste.


Note: the other functions on the Node interface which could potentially run into a similar issue are Kind(), IsAbsent(), and IsNull() However, it seems like these are less likely to be lazily constructed than Length and therefore less likely to be a problem.

Similarly, the map and list iterators can return errors once they're in use (and they can be checked in advance with Kind()) and so are also less problematic.

rvagg commented 1 year ago

I agree that this is a problem, and it would be good to resolve it without too much of a hack.

I know that during development there was an attempt to reign in the verbosity of the API—it's already very verbose but it could be a lot more by allowing everything to error .. but maybe they should in a world where we're building multi-layered abstractions over these things with complex behaviour underneath.

In terms of options:

  1. Change Length to return an error

I'd prefer this, but it's going to be very painful breaking that contract, with everyone needing to make the jump together. Are we prepared to deal with that pain?

  1. Change the contract to allow -1 to mean any type of error

This could end up being more dangerous; a length is often used for iteration. Silently changing contract on this could leave users exposed to surprising, or damaging, behaviour. It might be best to be noisy about it.

  1. Add a new function (name TBD) that does return an error and deprecate Length

Does this buy us anything that option 1 doesn't give us? It breaks the interface so the same dependency upgrade pain will exist, we just get to wriggle out of having to check for err except that you get deprecation warnings when you don't! So it seems like a worse version of 1.

Another option exists, to ignore the problem and keep on papering over it. The unixfs-preload one is pretty annoying because one way to handle problems like this is to hold on to an error and return it on a later call to an error returning API call, but with unixfs-preload, no more calls to the Node are made so it gets lost entirely!

rvagg commented 1 year ago

https://github.com/ipld/go-ipld-prime/pull/532, just to see what that would look like internally