unlock-protocol / unlock

Ʉnlock is a protocol for memberships built on a blockchain.
https://unlock-protocol.com
MIT License
821 stars 238 forks source link

Key Data #831

Closed julien51 closed 5 years ago

julien51 commented 5 years ago

Each key has an optional data field. This is important because it lets the customer (owner of the key) communicate something to the creator (lock owner). Use cases for this are for example: encrypted email addresses, encrypted names, images... etc

Our current implementation stores this on chain (the data field is actually bytes which is not limited. According to @nfurfaro 's research, a challenge introduced by this unbounded field means that it is very hard to quickly paginate keys. We would need a fixed size field to quickly be able to retrieve X keys at once.

Another (more obvious) challenge is obviously that this can be very costly (more on this old article).

Now this problem is in no way unique to Unlock. Many crypto projects are actually facing similar challenges. The general approach seems to be to store the data "off chain" and only store a pointer to it on chain.

There are multiple choice for the platform to pick, including IPFS, Dat and others.

nfurfaro commented 5 years ago

I now think using the ERC721-metadata spec could be considered the "standard " approach. It has the benefit of being storage provider agnostic . https://github.com/ethereum/EIPs/blob/master/EIPS/eip-721.md

julien51 commented 5 years ago

But wait this would be lock-level metadata, right?

nfurfaro commented 5 years ago

This should be key (NFT)-level metadata, that we're currently using the Key.data field for.

Edit: This is actually Lock-level as Julien said above. My mistake.

julien51 commented 5 years ago

Ha! I see you, mean the tokenURI stuff? I am not a huge fan of that approach because that obviously introduce a pretty serious failure point (the URL). What do you think?

julien51 commented 5 years ago

However i guess we could use the name stuff (cc @akeem)

nfurfaro commented 5 years ago

@julien51 Regarding the tokenURI: It is allowed to be mutable, and so could be updated if the storage location changes. Are your concerns more about ensuring the integrity of the data at the storage location?

nfurfaro commented 5 years ago

"Metadata Choices (metadata extension)

We have required name and symbol functions in the metadata extension. Every token EIP and draft we reviewed (ERC-20, ERC-223, ERC-677, ERC-777, ERC-827) included these functions.

We remind implementation authors that the empty string is a valid response to name and symbol if you protest to the usage of this mechanism. We also remind everyone that any smart contract can use the same name and symbol as your contract. How a client may determine which ERC-721 smart contracts are well-known (canonical) is outside the scope of this standard.

A mechanism is provided to associate NFTs with URIs. We expect that many implementations will take advantage of this to provide metadata for each NFT. The image size recommendation is taken from Instagram, they probably know much about image usability. The URI MAY be mutable (i.e. it changes from time to time). We considered an NFT representing ownership of a house, in this case metadata about the house (image, occupants, etc.) can naturally change.

Metadata is returned as a string value. Currently this is only usable as calling from web3, not from other contracts. This is acceptable because we have not considered a use case where an on-blockchain application would query such information.

Alternatives considered: put all metadata for each asset on the blockchain (too expensive), use URL templates to query metadata parts (URL templates do not work with all URL schemes, especially P2P URLs), multiaddr network address (not mature enough)"

From: https://github.com/ethereum/EIPs/blob/master/EIPS/eip-721.md

chiro-hiro commented 5 years ago

At previous time, we were try to launch a decentralized lottery chain. We've faced the same issue, we deceived to store merkle-root of metadata on main-net and store ticket's data on private-net. It could be stupid but it worked ;)

akeem commented 5 years ago

TokenURI potentially being mutable does provide us some flexibility here, we can begin to expose this information via a consistent URI using Locksmith.

Generally I think if we can provide lock creators/implementors with the ability to "migrate" the hosting of key metadata elsewhere we will be in a good place. The answer to this should also answer our own question of how can we move this data to avoid the aforementioned category of failures

julien51 commented 5 years ago

The problem with the URI approach is that key owners cannot update their data unless we actually implement some kind of app on our end to do that (and store said data).

cellog commented 5 years ago

Could a separate smart contract be used to store variable-length data? Then we just need an address to get the data

julien51 commented 5 years ago

It would still be as expensive (actually even more!) and probably does not provide any benefit compared to IPFS.

nfurfaro commented 5 years ago

Part of the difficulty here is that in the ERC721 metadata spec, the metadata is specific to the token, whereas in the Unlock sense, we're trying to use the key's data field to store owner-specific data. I think that we should at least consider removing any connection or reference to a specific owner at the key level, and instead track ownership at the lock level. This is not a necessity, as there's nothing in the erc721 spec that prevents us from mutating the metadata, or using it to record owner-specific details. It's just that the original intent for the metadata field was to store token specific details, such as an image of the asset represented.

akeem commented 5 years ago

definitely spitballing here, this may solve some problems and introduce others. We should be able to update the lock contract/factory contract, allowing key holders with the ability to update the tokenuri for their key(s).

This doesn't eliminate us or anyone else from having to host the data or create an application enforcing consistency but it should allow the information to be updatable by the keyholder

nfurfaro commented 5 years ago

It would be fairly routine to implement a setData() function on the smart contract end, but as @julien51 commented above, the question becomes "where does the ui for this live?" It doesn't belong in the creator dashboard, so another app/ui becomes necessary.

cellog commented 5 years ago

I could see this living in a web app or native phone app, for managing all your keys, the consumer version of creator dashboard, in addition to an optional app on the page that is being unlocked, i.e. "update your subscription info" link provided by the lock owner

julien51 commented 5 years ago

Part of the difficulty here is that in the ERC721 metadata spec, the metadata is specific to the token, whereas in the Unlock sense, we're trying to use the key's data field to store owner-specific data. I think that we should at least consider removing any connection or reference to a specific owner At the Key level, and instead track ownership at the lock level.

Exactly.

It would be fairly routine to implement a setData() function on the smart contract end, by as @julien51 commented above, the question becomes where does the ui for this live? It doesn't belong in the creator dashboard, so another app/ui becomes necessary.

This will live on the specific applications. the paywall probably has no use for the data field, but the unlock.email app flow does actually implement that (the email address is stored in the data field.)

I could see this living in a web app or native phone app, for managing all your keys, the consumer version of creator dashboard, in addition to an optional app on the page that is being unlocked, i.e. "update your subscription info" link provided by the lock owner

And yes! At some point, probably in the context of v2 i want us to build a "keychain" application which lets someone list all their keys, see/set their data fields (when applicable) and of course, trade them!

nfurfaro commented 5 years ago

@julien51 @akeem I’m just wondering where we stand with the lock metadata discussion (although it's not my current focus)? It looks like @Akeem’s PR #920 added the ability to store lock metadata. Would it be safe to say that we're going to move foreward with implementing the optional "metadata extension" on top of the basic erc721 spec?

For reference:

/// @title ERC-721 Non-Fungible Token Standard, optional metadata extension
/// @dev See https://github.com/ethereum/EIPs/blob/master/EIPS/eip-721.md
///  Note: the ERC-165 identifier for this interface is 0x5b5e139f.
interface ERC721Metadata /* is ERC721 */ {
    /// @notice A descriptive name for a collection of NFTs in this contract
    function name() external view returns (string _name);

    /// @notice An abbreviated name for NFTs in this contract
    function symbol() external view returns (string _symbol);

    /// @notice A distinct Uniform Resource Identifier (URI) for a given asset.
    /// @dev Throws if `_tokenId` is not a valid NFT. URIs are defined in RFC
    ///  3986. The URI may point to a JSON file that conforms to the "ERC721
    ///  Metadata JSON Schema".
    function tokenURI(uint256 _tokenId) external view returns (string);
}

This is the "ERC721 Metadata JSON Schema" referenced above.

{
    "title": "Asset Metadata",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "description": "Identifies the asset to which this NFT represents"
        },
        "description": {
            "type": "string",
            "description": "Describes the asset to which this NFT represents"
        },
        "image": {
            "type": "string",
            "description": "A URI pointing to a resource with mime type image/* representing the asset to which this NFT represents. Consider making any images at a width between 320 and 1080 pixels and aspect ratio between 1.91:1 and 4:5 inclusive."
        }
    }
}

cc @hardlydifficult

akeem commented 5 years ago

@nfurfaro I think we still need some alignment with @julien51. Locksmith should be able to accommodate this, but I am not sure if we want to take this approach based on the valid points raised around the availability of v2 related applications allowing for updates

nfurfaro commented 5 years ago

@akeem I'm just wondering... If we choose to use locksmith to store our key's metadata, what type of "pointer" would we have to the data that we could store in the key on-chain? Would it be possible to restrict the size to 32 bytes/ 32 ASCII-character string?

Also, While implementation of either one will be closely linked to the other, I'm not considering metadata updateability here, only metadata storage location/format.

nfurfaro commented 5 years ago

Considering updateability:

If key metadata contains owner-specific information, && keys are transferrable to a new owner, then metaData must be updatable.

akeem commented 5 years ago

@akeem I'm just wondering... If we choose to use locksmith to store our key's metadata, what type of "pointer" would we have to the data that we could store in the key on-chain? Would it be possible to restrict the size to 32 bytes/ 32 ASCII-character string?

Also, While implementation of either one will be closely linked to the other, I'm not considering metadata updateability here, only metadata storage location/format.

I'm not sure if the net effect of this would be materially different than utilizing something content addressable, but I think it should be able to hash the the address of the key's parent lock? and it's token id within 32 bytes (could definitely be incorrect about this)

HardlyDifficult commented 5 years ago

Summarizing my understanding of this issue, the options, and a proposed path forward. Let me know what you think - I've missed a lot of the background discussion on this one.

User stories - the key owner would like to share information with the lock owner, such as:

By publishing this information to the contract, Lock owners do not need to maintain parallel infrastructure to collect data from key owners. Additionally in terms of distribution this aids in scaling and possibly improves availability. It also ensures the data is from the key owner themselves and cannot be tampered with.

Data is user specific - so it is cleared when a Key is transferred.

Proposal, use IPFS:

Current implementation, arbitrary data:

Alternatives (in no particular order):

Proposed solution: continue with arbitrary data in the contract, but limit the length to x (TBD after some testing of gas impact). If the content needs to exceed this then the data published can be a pointer to the content on IPFS, allowing the front-end to make a follow up request to IPFS as appropriate. Continue allowing the user to publish the data field on purchase, but also add the ability for the Key owner to change it in the future.

Use a uint field to define data type which is set on the Lock (not each key) -- or maybe a short string. The front-end would use this to switch to the correct flow. This is very flexible, even allowing a Lock owner to build a custom solution (e.g. maybe store in GunDB). The front-end could do input validation depending on the expected datatype, however it will always be possible for the user to override that and the contract cannot validate the data published without limiting what types of content we support and potentially increasing gas costs significantly.

BTW will this content ever be surfaced by the Lock owner, and if so do we need admin functions for the Lock owner? For example if the data is images for user avatars and the Lock owner shows these on their website, how would the Lock owner deal with offensive images?

HardlyDifficult commented 5 years ago

Another user story, which changes the requirements a bit:

To support this use case we need to:

Note that the Lock owner could offer a second Lock which offers a key to all content with a single purchase.

nfurfaro commented 5 years ago

I think this is a good summary of the issue to date. On the whole, I support this approach, though I'd like to hear from @julien51 on it as well.

I agree that IPFS seems to be somewhat inefficient for small data sets. Where its focus is on file storage, we may want to consider a true decentralised database (like Bluezelle) instead, which is more optimised for storing arbitrary data rather than files.

As for your proposed solution:

I like it. It should be easy enough to map uints => data types and record it (similar to what you've done for revert error codes recorded in an undeployed contract)

This may be a good compromise in that it keeps the data on-chain for what may end up being a large portion of locks/keys. However, if we will support larger sets of data then the IPFS (or alternative) functionality needs to be added regardless of whether or not most keys need it... It may make sense to cap the data and simply not allow going over the cap unless there's a future demand for that feature. That said, the most "convenient" place to cap the size would be at bytes32, but again, not sure that's enough to work with here. And as you mentioned, if we continue using type bytes we'll need to implement pagination (with respect to the data field) differently, but not unlike what would be required if using IPFS.

I think this is important (assuming that the data is key-owner specific). When a key is transferred, the data needs to be re-set.

  • "Use one Lock to support per-content purchases"

I see the potential use-case here, but it seems to introduce a couple requirements which are at odds with the direction we've been heading, such as ensuring the user cannot change the data (I believe this is actually an ability we need to ADD to support transferrable keys properly) and allowing users to own multiple keys.

HardlyDifficult commented 5 years ago

RE the uint datatype: maybe we use a hash of a descriptive name. eg. keccak('ipfs') or versioned like keccak('ipfs-v0.0.1')

HardlyDifficult commented 5 years ago

I moved the per-content use case to a new issue: #1682

nfurfaro commented 5 years ago

RE "maybe we use a hash of a descriptive name": This would not really allow one to retrieve the datatype though...only to confirm that it is indeed keccak('ipfs') or whatever. Whereas something like uint public datatype = 1, and then a mapping (off-chain) such as 1 = 'IPFS' (or versioned, 1 = 'ipfs-v0.0.1' allows discovery of the datatype, not just confirmation.

HardlyDifficult commented 5 years ago

The downside with the numbering system is if someone wanted to add a new use case, such as GunDB, it's not clear we could allow that without encountering collisions. By using the hash, we'll never collide. Yes you can only confirm a match, but that means the front-end only known how data is stored if it's in a format it understands... otherwise it does not matter since the front-end can't leverage it.

nfurfaro commented 5 years ago

hmm. This is true. I suppose storing a string would give the benefits of both (with potentially higher gas cost than a bytes32 hash, but it's only 1 string per lock and can't really be abused as it's only the owner who would set it...), preventing collision and allowing for discovery. As you said, maybe the discovery here isn't at all critical though.

nfurfaro commented 5 years ago

So, the decision between using bytes32 or bytes for the data field still needs to be made with Julien's input. I think everyone agrees that some sort of datatype or dataLocation field is needed here( at the lock-level, not at the individual key level), and needs to be implemented regardless of how we appraoch the data field (unless an unbounded bytes is used and all data is stored on-chain). Also, it seems likely(though not certain) that IPFS will play some part in this (at least as a first iteration) but I think most of the work there will be frontend/backend integration, not so much in the smart contracts.

HardlyDifficult commented 5 years ago

Requirements under discussion here: https://docs.google.com/document/d/1sA2vvQOi_ovCjt9UM43utOQm61Fj9ceIWiEC67yaaNk/edit?ts=5c781096

julien51 commented 5 years ago

Since we do not have a onchain data field anymore, this can be closed!