ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.16k stars 3.01k forks source link

Validate sharded directory structure in `ipfs ls` #8196

Open Stebalien opened 3 years ago

Stebalien commented 3 years ago

Currently, when ipfs ls lists a sharded directory, it naively walks the dag without actually validating the internal structure. This means ipfs ls /ipfs/QmFoo might list some file named "bar", and ipfs get /ipfs/QmFoo/bar might then fail (because the directory was malformed).

We should be validating this structure as we traverse it.

schomatis commented 2 years ago

@Stebalien

it naively walks the dag without actually validating the internal structure

(From https://github.com/ipfs/go-ipfs/issues/8072)

The real bug here is that we don't verify the HAMT structure when listing, we just blindly walk the DAG.

I'm having trouble identifying in the code what exactly do these statements mean and what would verifying the directory entail. We normally call EnumLinksAsync on the dir which would seem implies knowing (and validating?) how the HAMT operates. The only possible scenario that I'm finding for this issue (but please point me to a concrete example if there is one) is having a HAMT directory with an incorrect UnixFS format, for example:

  1. A HAMT directory incorrectly tagged as a Basic one. This indeed would list all links (actual directory entries but also intermediate HAMT shard nodes) in the root DAG node possibly giving the incorrect behavior from the cited issue. ((*BasicDirectory).EnumLinksAsync() just list all links and has no validation as there is no AFAIK incorrect format for a basic dir link.)
  2. A HAMT directory incorrectly tagged as not a directory, which would follow a similar path of listing DAG links blindly.

In both cases I'm not sure how to recognize the incorrectly tagged (maybe corrupted) HAMT directory as such and avoid the above behavior, but maybe I'm misunderstanding the issue and need a concrete example. I'm having trouble finding the example directory /ipfs/QmUygZRt3uF4gco8Ff3qmRa9xpYZsodhijPPVD2XmubBLr/ of the original issue (getting 504 timeouts).

BigLep commented 2 years ago

2022-03-11 conversation: we're less keen on working items like this because of needing to do it twice (go-ipld-prime, legacy unixfs code).