ipfs / boxo

A set of reference libraries for building IPFS applications and implementations in Go.
https://github.com/ipfs/boxo#readme
Other
216 stars 99 forks source link

Allow putting non-path data as an IPNS value #475

Closed master255 closed 1 year ago

master255 commented 1 year ago

Why does value have the Path type? And not []byte as before?

This takes more computation and makes it difficult to store any other data other than Path in IPNS.

welcome[bot] commented 1 year ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

aschmahmann commented 1 year ago

@master255 what are you putting in the IPNS record field? The spec https://specs.ipfs.tech/ipns/ipns-record/#ipns-record indicates that the records should be paths.

master255 commented 1 year ago

@aschmahmann How do I save my data then? I need to store some hash in the DHT, for example Sha1. And that's the saving format I have: sequence_number:hash Because if an attacker intercepts my old entry, they can send it to everyone and it will cause spam, which will make it impossible to publish the real value. How do you get out of these situations?

Jorropo commented 1 year ago

@master255 have you tried making it a CID ? Theses are self describing hashes with a codec number pointing in the multicodec table, it also include a multihash which allows to not hardcode the hash function used: https://github.com/multiformats/multicodec

master255 commented 1 year ago

Only a signed sequence number gives the ability to resist a spam attack.

master255 commented 1 year ago

@Jorropo I don't know what you're talking about.

master255 commented 1 year ago

@aschmahmann @Jorropo Okay. I looked that the sequence number is also encrypted when creating the Cbor. That's good. But it is still necessary for me to store different types of hashes. Why can't I do that? Maybe it is necessary to store an object (serialized) to create a decentralized database.

Jorropo commented 1 year ago

But it is still necessary for me to store different types of hashes. Why can't I do that? Maybe it is necessary to store an object (serialized) to create a decentralized database.

You can do that using cids. CIDs let you change the encoding, hash length and hash function used. I think you should take a look at:

Then you would create a path using dag-cbor CID and announce that.

master255 commented 1 year ago

@Jorropo But it is not necessary for me to have a CID. The CID is a reference to the content, and I want to store the content itself. If I store a link to the content - and get that link, and then search for that content, it will be very long. It's much faster to store the content and retrieve it immediately. The size of my content is very small. Up to 1000 bytes of information.

master255 commented 1 year ago

@Jorropo And my second argument - path then turns into []byte anyway https://github.com/ipfs/boxo/blob/c34473b4a186f1c1bf3c5430b04bafd7a37b338b/ipns/record.go#L309C10-L309C10 Why make it Path then? Path->[]byte conversion takes CPU, which can be avoided.

Second time: https://github.com/ipfs/boxo/blob/c34473b4a186f1c1bf3c5430b04bafd7a37b338b/ipns/record.go#L262

Now it's also in String. This is a programming masterpiece :) https://github.com/ipfs/boxo/blob/8660c625624d226bce2ca1ffccbb2b21c7e4132d/ipns/record.go#L309

Let's make the value []byte and it will be easier for everyone :)

lidel commented 1 year ago

Why does value have the Path type? And not []byte as before?

The intention behind boxo change here is to unify behavior across implementations to produce normalized records that follow specification AND to incentivize people who don't use content-paths to use Data field instead.

The Value was always intended to be a pointer to a content path (like (/ipfs/cid or /ipns/id), but some implementations put arbitrary bytes with CID as string, or a raw CID there or arbitrary strings, due to the lack of type safety.

Old IPNS libraries used in GO and JS "happened" to correctly parse CIDs and uplift them to a path just because of lax implementation, rather than following the semantic meaning from the spec. We've cleaned that up in the latest js-ipns and boxo/ipns releases to normalize CID into content path before signing a fresh record, so the new records are always spec-compliant.

pb.Value = []byte(value) Now it's also in String. This is a programming masterpiece :)

It may be "good" for GO, which uses bytes and strings interchangeably, but causes implementation bugs and spec compliance issues in other languages, where bytes and strings are not 1:1 swappable (e.g. JavaScript engines use UTF-16 transformations).

We've seen bugs in other languages because there was no spec, and people tried to do what GO does.

:point_right: The spec aims to describe things that are easy to implement in ANY language, no just GO, and mixing strings with binary data is a huge implementation footgun we want to avoid.

And that's the saving format I have: sequence_number:hash Path->[]byte conversion takes CPU, which can be avoided.

It sounds like you are not using IPNS for content-addressing. You have some arbitrary byte data that you want to attach to IPNS record.

Good news is that is why we have the extensible Data DAG-CBOR field – anyone can add additional key-value pairs there, and they will be signed automatically.

Bad news, is that there is no API to do that yet (cc @hacdias)

:point_right: @master255 I am trying to understand what is the most important for you:

master255 commented 1 year ago

@lidel

I am trying to understand what is the most important for you:

  • Is the ask here to expose (in boxo/ipns) the ability to add key-values to the extensible Data field? (e.g. by adding constructor option WithExtraData)
  • Or are you asking for spec change (IPIP) and allow perf optimization by compacting /ipfs/cid string paths to a raw CID in binary form? (somehow sensible: implementations already allow that for backward-compatibility, but it slightly complicates the spec for new implementers)

If you are talking about adding a WithExtraData ([]byte) method, I am talking about that yes. This will allow storing arbitrary bytes in IpnsEntry. And with this create decentralized databases. Perhaps it is necessary to have a second method with Value of type []byte (instead of Path)?

The second point I don't really understand. But I am in favor of code optimization. And my artificial intelligence writes that conversion still takes CPU time and extra memory. And since in DHT all data is stored in []byte , it would be logical to use this type. Developers can store Path objects in []byte and then we will get a double conversion []byte->Path->[]byte, which is obviously not rational. image

master255 commented 1 year ago

In my DHT program (based on boxo) - I store and use all DHT data (keys, entry) in bytes and only when it is necessary to display it to the user I convert it to String. Since I use this data inside the program more often than I display it on the screen.

master255 commented 1 year ago

@Jorropo Now I'm using boxo 0.10 version to use []byte instead of Path. Maybe it is not necessary to do anything? And I can always use boxo 0.10 version?

aschmahmann commented 1 year ago

@master255 there seem to be two different issues that you're complaining about here (although I can't tell what's more important to you):

  1. You don't see the value in having typed data vs []byte because the conversions cost CPU
    • Generally speaking unless code is in the application critical path and there's a demonstrable speedup it's a better move to have strongly typed data where the compiler can protect the user from errors than not. Programming languages, frameworks, and libraries make these tradeoffs regularly.
  2. You would prefer the IPNS spec handle arbitrary bytes rather than a path with a CID in it, despite storing a hash
    • There are options above like just adding the extra bytes to go from hash -> path with CID
    • Putting the data as a CBOR field (if you'd like to add an API +tests to make writing this data easier that seems reasonable), although using a blank path and a CBOR field is not standard
    • Propose a spec modification to allow things closer to how you'd like and discuss changes there

Going forward you have a few options:

  1. Conform to the IPNS spec
    • Use one of the recommended approaches with boxo
    • Make/copy-paste your own parser that only works with bytes. Libraries like go-libp2p-kad-dht work with bytes since they handle arbitrary record types so it's really just a small library you'd need to make. IMO forking to save a negligible amount of CPU isn't worth it but 🤷.
  2. Propose a spec change taking into account the context from https://github.com/ipfs/boxo/issues/475#issuecomment-1766434938
  3. Don't conform to the IPNS spec
    • If you're hoping to use resources like the Amino DHT (formerly referred to as the IPFS Public DHT) to distribute your records then this working long-term is not guaranteed since behavior not covered by specs is undefined and unsupported.

And I can always use boxo 0.10 version?

No one is deleting the v0.10.0 tag, but if you're looking for bug/security fixes, new features, etc. then sticking with v0.10.0 is not a good idea. Additionally, if your bytes do not conform to the IPNS spec then you have the same caveats as described above.

master255 commented 1 year ago

@aschmahmann I don't need any corrections/fixes. I just need two methods to work: put and get.

I think a reasonable compromise would be to add a method that accepts []byte rather than Path. So that everyone could use both Path and []byte.

aschmahmann commented 1 year ago

@master255 using your example if you called DHTPutValue(publicKeyBytes, privateKeyBytes, value) where value was []byte("this-is-not-a-path") against the Amino DHT as an IPNS record it would not be guaranteed to succeed because it is not supported at the spec level as already documented above. Adding in functions that enable users to create non-spec compliant IPNS records isn't something there's interest to do here.

If you'd like to take advantage of unspecified behavior that only works with some implementations you can certainly copy-paste the ipns package from here into your own repo and modify it with the unspecified behavior, although it's not something I would recommend as worthwhile. I've given a number of alternatives above if this approach is not what you'd like to do.

master255 commented 1 year ago

@aschmahmann Why aren't you interested? It will allow to create a decentralized database. And there are a lot of good programs based on it. Are you not interested in creating good programs?

aschmahmann commented 1 year ago

Why aren't you interested?

@master255 I feel like I've outlined my points above, but mostly it's because giving users an API for creating brittle and non-spec compliant IPNS records doesn't seem like a great idea since users may end up relying on behavior that will later break. Sometimes users end up needing more flexibility, but just as Go does with unsafe you tend to not expose those to users unless there's a really pressing demand which IMO you haven't demonstrated (i.e. it's not worth adding and maintaining a new ipns.NewUnsafeRecord function without pressing need).

Note: You're claiming tremendous waste (CPU, bytes, etc.) but there's no data that this is true.

It will allow to create a decentralized database. And there are a lot of good programs based on it. Are you not interested in creating good programs?

For the relatively low cost of a few bytes you can take any pile of bytes and create a new path as:

mh, err := multihash.Encode(myTotallyCustomData, multihash.IDENTITY)
if err != nil { return err }
validValue := path.FromCid(cid.NewCidV1(cid.Raw, mh))

and the problem is solved.

Standards aren't particularly useful if people aren't using them which is why we evolve them together so that we end up with interoperable software. If there are parts of the standard you don't like they can be explored (as mentioned above), but they will reasonably want to take into account more software than just yours and how that change will impact them.

master255 commented 1 year ago

@aschmahmann No, no. This has nothing to do with users. In my application, users do not have access to IPNS records. That's handled by the application. Maybe you don't know how DHT works. Let me tell you :) There are many different ways to use DHT. My application uses the "DHT over DHT" scheme. There are bootstrap node users (the application runs 24/7) and there are regular users. A new user - at the first startup - gets the contents of the bootstrap node DHT cells using a public key (which is embedded in each application). In these cells he gets the CID (instructions on how to download the database) of the database. He then download the database. The database contains the public keys of all users who host the database. Each user stores the database and adds himself to it when he downloads it. Then, he publishes a CID under the public and private (not secret) key, which are stored in the current cell of the DHT. The contents of the current cell and the bootstrap cell are the same. All users check the contents of the current cell and the new cell (whose public and private key is stored in the current cell) once every 5-10 minutes. If the current cell changes or disappears, all users republish it. That is, the current cell is static and cannot change. If there is data in the new cell, all users download the new database (with fraud protection).

When users turn off the application and turn it on again, they already have a database with the public keys of the database keepers. Users randomly retrieve the values of 10 keeper cells and calculate the average CID. This average CID is downloaded and its private and public key is used to publish new updates. If there is any data on the new key, a switch is made to it and so on....

This is how a decentralized database works. And if you forbid publishing []byte, it becomes impossible.

aschmahmann commented 1 year ago

Unfortunately I don't think this discussion is going anywhere. The specific request that you have (violate the IPNS spec and allow using []byte("not-a-path") as the value in the record is currently a WONT-FIX and none of the alternative mentioned have been of any interest for you to pursue.

As a result I'm closing this issue. I'm also changing the title of the issue to indicate the request that is being closed (i.e. Allow putting non-path data as an IPNS value). I understand this may be disappointing, but I've tried to outline the reasons and options available in the various posts above.


You may want to check out the community forums or chat channels https://docs.ipfs.tech/community/#get-involved to see if others are interested and able to help you out in the design and approach of your application, or some more of the technical/code aspects.

Given you're really set on storing arbitrary bytes in the IPNS records I'd recommend trying the approach I mentioned (with code) above https://github.com/ipfs/boxo/issues/475#issuecomment-1780292393. It should be enough for your use case and if after trying it it turns out that you see a reproducible huge CPU hotspot or something that would be an interesting result.

master255 commented 1 year ago

@aschmahmann What if it is necessary for me to store other hashes in the DHT? For example, sha1. Most content can now be downloaded using sha1 only. And calculating the other hash is very difficult (impossible).

Jorropo commented 1 year ago

@master255 I would suggest you go through https://proto.school/anatomy-of-a-cid. The path uses a CID which have a multihash field, this allows to specify which hash function should be used, so you can just use a sha1 multihash.

You can also use the identity hash function which actually is not a hash function but let you embed data directly inside the CID.

master255 commented 4 months ago

@Jorropo As a result, nothing works anymore. You have destroyed the ability to store byte[] in DHT records.