Key Data - Githubissues

julien51 commented 5 years ago

Each key has an optional data field. This is important because it lets the customer (owner of the key) communicate something to the creator (lock owner). Use cases for this are for example: encrypted email addresses, encrypted names, images... etc

Our current implementation stores this on chain (the data field is actually bytes which is not limited. According to @nfurfaro 's research, a challenge introduced by this unbounded field means that it is very hard to quickly paginate keys. We would need a fixed size field to quickly be able to retrieve X keys at once.

Another (more obvious) challenge is obviously that this can be very costly (more on this old article).

Now this problem is in no way unique to Unlock. Many crypto projects are actually facing similar challenges. The general approach seems to be to store the data "off chain" and only store a pointer to it on chain.

There are multiple choice for the platform to pick, including IPFS, Dat and others.

nfurfaro commented 5 years ago

I now think using the ERC721-metadata spec could be considered the "standard " approach. It has the benefit of being storage provider agnostic . https://github.com/ethereum/EIPs/blob/master/EIPS/eip-721.md

julien51 commented 5 years ago

But wait this would be lock-level metadata, right?

nfurfaro commented 5 years ago

This should be key (NFT)-level metadata, that we're currently using the Key.data field for.

Edit: This is actually Lock-level as Julien said above. My mistake.

julien51 commented 5 years ago

Ha! I see you, mean the tokenURI stuff? I am not a huge fan of that approach because that obviously introduce a pretty serious failure point (the URL). What do you think?

julien51 commented 5 years ago

However i guess we could use the name stuff (cc @akeem)

nfurfaro commented 5 years ago

@julien51 Regarding the tokenURI: It is allowed to be mutable, and so could be updated if the storage location changes. Are your concerns more about ensuring the integrity of the data at the storage location?

nfurfaro commented 5 years ago

"Metadata Choices (metadata extension)

We have required name and symbol functions in the metadata extension. Every token EIP and draft we reviewed (ERC-20, ERC-223, ERC-677, ERC-777, ERC-827) included these functions.

We remind implementation authors that the empty string is a valid response to name and symbol if you protest to the usage of this mechanism. We also remind everyone that any smart contract can use the same name and symbol as your contract. How a client may determine which ERC-721 smart contracts are well-known (canonical) is outside the scope of this standard.

A mechanism is provided to associate NFTs with URIs. We expect that many implementations will take advantage of this to provide metadata for each NFT. The image size recommendation is taken from Instagram, they probably know much about image usability. The URI MAY be mutable (i.e. it changes from time to time). We considered an NFT representing ownership of a house, in this case metadata about the house (image, occupants, etc.) can naturally change.

Metadata is returned as a string value. Currently this is only usable as calling from web3, not from other contracts. This is acceptable because we have not considered a use case where an on-blockchain application would query such information.

Alternatives considered: put all metadata for each asset on the blockchain (too expensive), use URL templates to query metadata parts (URL templates do not work with all URL schemes, especially P2P URLs), multiaddr network address (not mature enough)"

From: https://github.com/ethereum/EIPs/blob/master/EIPS/eip-721.md

chiro-hiro commented 5 years ago

At previous time, we were try to launch a decentralized lottery chain. We've faced the same issue, we deceived to store merkle-root of metadata on main-net and store ticket's data on private-net. It could be stupid but it worked ;)

akeem commented 5 years ago

TokenURI potentially being mutable does provide us some flexibility here, we can begin to expose this information via a consistent URI using Locksmith.

Generally I think if we can provide lock creators/implementors with the ability to "migrate" the hosting of key metadata elsewhere we will be in a good place. The answer to this should also answer our own question of how can we move this data to avoid the aforementioned category of failures

julien51 commented 5 years ago

The problem with the URI approach is that key owners cannot update their data unless we actually implement some kind of app on our end to do that (and store said data).

cellog commented 5 years ago

Could a separate smart contract be used to store variable-length data? Then we just need an address to get the data

julien51 commented 5 years ago

It would still be as expensive (actually even more!) and probably does not provide any benefit compared to IPFS.

nfurfaro commented 5 years ago

Part of the difficulty here is that in the ERC721 metadata spec, the metadata is specific to the token, whereas in the Unlock sense, we're trying to use the key's data field to store owner-specific data. I think that we should at least consider removing any connection or reference to a specific owner at the key level, and instead track ownership at the lock level. This is not a necessity, as there's nothing in the erc721 spec that prevents us from mutating the metadata, or using it to record owner-specific details. It's just that the original intent for the metadata field was to store token specific details, such as an image of the asset represented.

akeem commented 5 years ago

definitely spitballing here, this may solve some problems and introduce others. We should be able to update the lock contract/factory contract, allowing key holders with the ability to update the tokenuri for their key(s).

This doesn't eliminate us or anyone else from having to host the data or create an application enforcing consistency but it should allow the information to be updatable by the keyholder

nfurfaro commented 5 years ago

It would be fairly routine to implement a setData() function on the smart contract end, but as @julien51 commented above, the question becomes "where does the ui for this live?" It doesn't belong in the creator dashboard, so another app/ui becomes necessary.

cellog commented 5 years ago

I could see this living in a web app or native phone app, for managing all your keys, the consumer version of creator dashboard, in addition to an optional app on the page that is being unlocked, i.e. "update your subscription info" link provided by the lock owner

julien51 commented 5 years ago

Part of the difficulty here is that in the ERC721 metadata spec, the metadata is specific to the token, whereas in the Unlock sense, we're trying to use the key's data field to store owner-specific data. I think that we should at least consider removing any connection or reference to a specific owner At the Key level, and instead track ownership at the lock level.

Exactly.

It would be fairly routine to implement a setData() function on the smart contract end, by as @julien51 commented above, the question becomes where does the ui for this live? It doesn't belong in the creator dashboard, so another app/ui becomes necessary.

This will live on the specific applications. the paywall probably has no use for the data field, but the unlock.email app flow does actually implement that (the email address is stored in the data field.)

I could see this living in a web app or native phone app, for managing all your keys, the consumer version of creator dashboard, in addition to an optional app on the page that is being unlocked, i.e. "update your subscription info" link provided by the lock owner

And yes! At some point, probably in the context of v2 i want us to build a "keychain" application which lets someone list all their keys, see/set their data fields (when applicable) and of course, trade them!

nfurfaro commented 5 years ago

@julien51 @akeem I’m just wondering where we stand with the lock metadata discussion (although it's not my current focus)? It looks like @Akeem’s PR #920 added the ability to store lock metadata. Would it be safe to say that we're going to move foreward with implementing the optional "metadata extension" on top of the basic erc721 spec?

For reference:

/// @title ERC-721 Non-Fungible Token Standard, optional metadata extension
/// @dev See https://github.com/ethereum/EIPs/blob/master/EIPS/eip-721.md
///  Note: the ERC-165 identifier for this interface is 0x5b5e139f.
interface ERC721Metadata /* is ERC721 */ {
    /// @notice A descriptive name for a collection of NFTs in this contract
    function name() external view returns (string _name);

    /// @notice An abbreviated name for NFTs in this contract
    function symbol() external view returns (string _symbol);

    /// @notice A distinct Uniform Resource Identifier (URI) for a given asset.
    /// @dev Throws if `_tokenId` is not a valid NFT. URIs are defined in RFC
    ///  3986. The URI may point to a JSON file that conforms to the "ERC721
    ///  Metadata JSON Schema".
    function tokenURI(uint256 _tokenId) external view returns (string);
}

This is the "ERC721 Metadata JSON Schema" referenced above.

{
    "title": "Asset Metadata",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "description": "Identifies the asset to which this NFT represents"
        },
        "description": {
            "type": "string",
            "description": "Describes the asset to which this NFT represents"
        },
        "image": {
            "type": "string",
            "description": "A URI pointing to a resource with mime type image/* representing the asset to which this NFT represents. Consider making any images at a width between 320 and 1080 pixels and aspect ratio between 1.91:1 and 4:5 inclusive."
        }
    }
}

cc @hardlydifficult

akeem commented 5 years ago

@nfurfaro I think we still need some alignment with @julien51. Locksmith should be able to accommodate this, but I am not sure if we want to take this approach based on the valid points raised around the availability of v2 related applications allowing for updates

nfurfaro commented 5 years ago

@akeem I'm just wondering... If we choose to use locksmith to store our key's metadata, what type of "pointer" would we have to the data that we could store in the key on-chain? Would it be possible to restrict the size to 32 bytes/ 32 ASCII-character string?

Also, While implementation of either one will be closely linked to the other, I'm not considering metadata updateability here, only metadata storage location/format.

nfurfaro commented 5 years ago

Considering updateability:

If key metadata contains owner-specific information, && keys are transferrable to a new owner, then metaData must be updatable.

Unless the keys will no longer store owner-specific information, it's safe to say that we will eventually need to implement updateability, whatever that entails.
If the keys will NOT store owner-specific information, this begs the question "What are we storing here?" It could simply be the data specified above, but then we're back to deciding where/how to store the owner's email address, for example.

akeem commented 5 years ago

@akeem I'm just wondering... If we choose to use locksmith to store our key's metadata, what type of "pointer" would we have to the data that we could store in the key on-chain? Would it be possible to restrict the size to 32 bytes/ 32 ASCII-character string?

Also, While implementation of either one will be closely linked to the other, I'm not considering metadata updateability here, only metadata storage location/format.

I'm not sure if the net effect of this would be materially different than utilizing something content addressable, but I think it should be able to hash the the address of the key's parent lock? and it's token id within 32 bytes (could definitely be incorrect about this)

HardlyDifficult commented 5 years ago

Summarizing my understanding of this issue, the options, and a proposed path forward. Let me know what you think - I've missed a lot of the background discussion on this one.

User stories - the key owner would like to share information with the lock owner, such as:

Encrypted email address, name, or even json for a full user profile.
Image. This could represent a user avatar.
URI to something of interest. Maybe to related content by the Key owner.

By publishing this information to the contract, Lock owners do not need to maintain parallel infrastructure to collect data from key owners. Additionally in terms of distribution this aids in scaling and possibly improves availability. It also ensures the data is from the key owner themselves and cannot be tampered with.

Data is user specific - so it is cleared when a Key is transferred.

Proposal, use IPFS:

Pro:
- Decentralized
- Cheap: no charge but per cons I believe we would need a box for this
- Salable: a benefit if the data is large and may be retrieved frequently
Con:
- Inefficient for small files (e.g. a url or other string) - both in terms of the gas cost to store the IPFS hash but more so on the IPFS side itself.
- Does not offer persistence (only permanence). We could try to add persistence, but I believe the only way we can guarantee persistence is by running the box which initially accepts content (either us or the Lock owner) and confirm the upload before posting the transaction. Downloads could point generically to IPFS nodes which should enable discovery of content as long as our node is always connected.
- Retrieving content requires two connections, first to Ethereum then to IPFS

Current implementation, arbitrary data:

Pro:
- Decentralized
- Flexible: we could even use this field to allow some Locks to use IPFS
- Persistence and permanence
Con:
- Expensive, particularly for large files
- May complicate pagination: we can either exclude the data from our paginated call and make a follow up call to download the data for a single Key at a time (which is very similar to the flow we would use with IPFS); or we could cap the max size of the data supported

Alternatives (in no particular order):

Store data in a different contract or a sidechain or a different blockchain or any other database (such as Locksmith): an option which could reduce costs, but it complicates the experience for users (especially if there’s transaction fees on both networks) and complicates the front-end implementation (but not more so than using IPFS).
ERC721 standard tokenUri (a URI to content with optional metadata), seems too limiting.
Store data in event data only: an option since contracts do not need to read the data directly. This would be cheaper than storing the data on chain but complicates the front-end implementation. However a risk, the Solidity docs say that nodes may eventually add rules for pruning old logs.
Store data in transaction input fields only: cheapest way to do on-chain storage, but a bit hacky and complicates the front-end implementation.

Proposed solution: continue with arbitrary data in the contract, but limit the length to x (TBD after some testing of gas impact). If the content needs to exceed this then the data published can be a pointer to the content on IPFS, allowing the front-end to make a follow up request to IPFS as appropriate. Continue allowing the user to publish the data field on purchase, but also add the ability for the Key owner to change it in the future.

Use a uint field to define data type which is set on the Lock (not each key) -- or maybe a short string. The front-end would use this to switch to the correct flow. This is very flexible, even allowing a Lock owner to build a custom solution (e.g. maybe store in GunDB). The front-end could do input validation depending on the expected datatype, however it will always be possible for the user to override that and the contract cannot validate the data published without limiting what types of content we support and potentially increasing gas costs significantly.

BTW will this content ever be surfaced by the Lock owner, and if so do we need admin functions for the Lock owner? For example if the data is images for user avatars and the Lock owner shows these on their website, how would the Lock owner deal with offensive images?

HardlyDifficult commented 5 years ago

Another user story, which changes the requirements a bit:

Use one Lock to support per-content purchases... where each individual piece of content is priced the same. We do this by including data with the key purchase such as an ID of the content of interest.

To support this use case we need to:

Save data on transfer (it's cleared today)
Ensure the user cannot change data post-purchase (on-par with what we have today)
Allow a single user to own multiple Keys for a Lock (not allowed today)

Note that the Lock owner could offer a second Lock which offers a key to all content with a single purchase.

nfurfaro commented 5 years ago

I think this is a good summary of the issue to date. On the whole, I support this approach, though I'd like to hear from @julien51 on it as well.

I agree that IPFS seems to be somewhat inefficient for small data sets. Where its focus is on file storage, we may want to consider a true decentralised database (like Bluezelle) instead, which is more optimised for storing arbitrary data rather than files.

As for your proposed solution:

"Use a uint field to define data type which is set on the Lock (not each key)"

I like it. It should be easy enough to map uints => data types and record it (similar to what you've done for revert error codes recorded in an undeployed contract)

"continue with arbitrary data in the contract, but limit the length to x"

This may be a good compromise in that it keeps the data on-chain for what may end up being a large portion of locks/keys. However, if we will support larger sets of data then the IPFS (or alternative) functionality needs to be added regardless of whether or not most keys need it... It may make sense to cap the data and simply not allow going over the cap unless there's a future demand for that feature. That said, the most "convenient" place to cap the size would be at bytes32, but again, not sure that's enough to work with here. And as you mentioned, if we continue using type bytes we'll need to implement pagination (with respect to the data field) differently, but not unlike what would be required if using IPFS.

"Continue allowing the user to publish the data field on purchase, but also add the ability for the Key owner to change it in the future."

I think this is important (assuming that the data is key-owner specific). When a key is transferred, the data needs to be re-set.

"Use one Lock to support per-content purchases"

I see the potential use-case here, but it seems to introduce a couple requirements which are at odds with the direction we've been heading, such as ensuring the user cannot change the data (I believe this is actually an ability we need to ADD to support transferrable keys properly) and allowing users to own multiple keys.

HardlyDifficult commented 5 years ago

RE the uint datatype: maybe we use a hash of a descriptive name. eg. keccak('ipfs') or versioned like keccak('ipfs-v0.0.1')

HardlyDifficult commented 5 years ago

I moved the per-content use case to a new issue: #1682

nfurfaro commented 5 years ago

RE "maybe we use a hash of a descriptive name": This would not really allow one to retrieve the datatype though...only to confirm that it is indeed keccak('ipfs') or whatever. Whereas something like uint public datatype = 1, and then a mapping (off-chain) such as 1 = 'IPFS' (or versioned, 1 = 'ipfs-v0.0.1' allows discovery of the datatype, not just confirmation.

HardlyDifficult commented 5 years ago

The downside with the numbering system is if someone wanted to add a new use case, such as GunDB, it's not clear we could allow that without encountering collisions. By using the hash, we'll never collide. Yes you can only confirm a match, but that means the front-end only known how data is stored if it's in a format it understands... otherwise it does not matter since the front-end can't leverage it.

nfurfaro commented 5 years ago

hmm. This is true. I suppose storing a string would give the benefits of both (with potentially higher gas cost than a bytes32 hash, but it's only 1 string per lock and can't really be abused as it's only the owner who would set it...), preventing collision and allowing for discovery. As you said, maybe the discovery here isn't at all critical though.

nfurfaro commented 5 years ago

So, the decision between using bytes32 or bytes for the data field still needs to be made with Julien's input. I think everyone agrees that some sort of datatype or dataLocation field is needed here( at the lock-level, not at the individual key level), and needs to be implemented regardless of how we appraoch the data field (unless an unbounded bytes is used and all data is stored on-chain). Also, it seems likely(though not certain) that IPFS will play some part in this (at least as a first iteration) but I think most of the work there will be frontend/backend integration, not so much in the smart contracts.

HardlyDifficult commented 5 years ago

Requirements under discussion here: https://docs.google.com/document/d/1sA2vvQOi_ovCjt9UM43utOQm61Fj9ceIWiEC67yaaNk/edit?ts=5c781096

julien51 commented 5 years ago

Since we do not have a onchain data field anymore, this can be closed!

unlock-protocol / unlock

Key Data #831