Closed Jorropo closed 1 year ago
Personally that a WONTFIX (except maybe some docs), having to use --cid-version=0
and then ipfs cid base32
if you care about backward compat doesn't look too bad.
I belive we just have to be carefull that this behaviour doesn't happen with #4143.
@Jorropo this isn't a bug - the version of a cid is part of the payload. A block with links in cidv0 is different byte-content ( thus different CID ) as opposed to a block with the very same links in cidv1.
@ribasushi I know, that why the:
Personally that a WONTFIX
It's just that go-ipfs attempt to have some hash coherency with future versions. (adding the same data multiple times with the same options in future versions should generally give you the same CID back)
We have to choose whether this qualify or not. And whether doing ipfs cid base32 $(ipfs add -n --cid-version=0)
is a valid alternative.
As well as document this alternative.
You are probably thinking of --upgrade-cidv0-in-output=true
. Note that this was raised ( and rejected ) in https://github.com/ipfs/go-ipfs/issues/4143#issuecomment-811050725
You are probably thinking of
--upgrade-cidv0-in-output=true
.
I'm essentially proposing the same thing but doing it explicitly with ipfs cid base32
(which already exists and works), I think the fix for this issue is just gonna be some more lines of docs somewhere.
Always happy to get more docs but is this coherency expected by the user somewhere? (And what exactly is the meaning of it?) It seems reliant on very specific and internal knowledge on how we construct these (raw leaves, chunker, etc.).
is this coherency expected by the user somewhere?
I have talked with a few people that assume it. We also have tried to somewhat maintain it.
And what exactly is the meaning of it?
Adding the same file multiple times yields the same CID (or at least a compatible one, so really it's only the same multihash).
It seems reliant on very specific and internal knowledge on how we construct these (raw leaves, chunker, etc.).
Yeah
Pushing some more on your definition to try to better understand it and see how we can fit it in the docs:
Adding the same file multiple times yields the same CID
You're changing the arguments, is the user that changes these still expect the same CID?
a compatible one
Then what does compatible mean? Same multihash? Is the user aware of it, and if it is, it expects it to go unchanged when it changes the CID or any other argument?
If anything this seems more of an issue with the UnixFS layer and explaining to the user that their file/directory/etc representation in IPFS is opaque and they shouldn't rely on the CID/multihash. We attempt to change this as little as possible but there are (AFAIK) no guarantees. For example, we now automatically change the directory implementation (from basic to HAMT) based (roughly) on its number of entries, when they reach a (variably-adjusted) threshold. There is no single "file" entity in IPFS as a continuous string of bits as (say) in the user's hard drive. I'd push for more documentation in that direction.
(Changing to a docs issues unless there is in fact a guarantee of CID/multihash stability in UnixFS.)
I have talked with a few people that assume it.
IMO there is in no way whatsoever a guarantee that ipfs-vX add foobar
generates the same DAG as ipfs-vX+1 add foobar
.
From ipfs add --help
If that wording is insufficient to get the point across we can modify that as well.
We also have tried to somewhat maintain it.
Correct, we don't like changing behavior because no matter what you tell people they'll start relying on things anyway unless it evolves too quickly for people to incorrectly assume it'll never change. This is part of why the defaults for ipfs add
have not been changed to a superior DAG structure that uses raw leaves (e.g. ipfs add --cid-version=1 --raw-leaves=true
(yes, I know --cid-version=1
implies raw leaves).
When the change to CIDv1 by default happens there will likely need to be a set of comms reminding people that defaults can change and how to generate CIDv0s if they really need to.
@Jorropo unless the idea is you want to add more text to ipfs add
this should be closed in favor of an issue in https://github.com/ipfs/ipfs-docs/issues
If anything this seems more of an issue with the UnixFS layer and explaining to the user that their file/directory/etc representation in IPFS is opaque and they shouldn't rely on the CID/multihash. We attempt to change this as little as possible but there are (AFAIK) no guarantees. For example, we now automatically change the directory implementation (from basic to HAMT) based (roughly) on its number of entries, when they reach a (variably-adjusted) threshold. There is no single "file" entity in IPFS as a continuous string of bits as (say) in the user's hard drive. I'd push for more documentation in that direction.
I agree with all of this.
I think the issue is getting side tracked.
The issue here is that it is surprising that the CID version change the hash. It seems like a non trivial amount of people (me including before I implemented some raw CID code) think that CID version is just an encoding trick, but the binary data wouldn't change, just how it gets made into text.
unless the idea is you want to add more text to
ipfs add
I want to add a
This also affects CID's binary representation inside blocks and could change your hashes.
similar thing to --cid-version
's help text.
I think the implicit assumption that is causing some confusion here is we're talking about a single block (strings of bits) of data.
In general changing CID version doesn't change the hash of the data (because we keep the same hashing algorithm, at least in v1).
But a file is (usually) not a single block of data but a DAG of them, and a DAG is not just the concatenation of the chunks of the original file but something more opaque that includes other metadata to structure it. (That metadata is in fact leaked in the CID when the user might not be actually interested in it.)
The bold text is not obvious to a new user (I had the same confusion for a lot of time) and some more docs could be added on that.
I had discussions since then, we ideally want to sunset CIDv0, this will never happen but we are still gonna try.
We want to default to encoding CIDv1 inside the binary blocks even when a CIDv0 could be used because it makes it anoying to write various implementations and have weird edge cases in some of them.
Checklist
Installation method
built from source
Version
Config
Description
When using
--cid-version=1
ipfs will use CIDv1 CIDs in the root blocks of the file, creating different hashes if the file gets chunked into multiple leaves.Reproduction