ipfs / pinning-services-api-spec

Standalone, vendor-agnostic Pinning Service API for IPFS ecosystem
https://ipfs.github.io/pinning-services-api-spec/
Creative Commons Zero v1.0 Universal
100 stars 27 forks source link

Disambiguating *.providers fields #41

Closed obo20 closed 3 years ago

obo20 commented 3 years ago

Currently the GET/{CID} endpoint returns an object that looks like this: image

The response returns a pin object that has values(providers and meta) that are duplicated in the root return object as well.

Is this intentional or an accident?

lidel commented 3 years ago

This is intentional – see explainers:

obo20 commented 3 years ago

@lidel Is there a reason we need to be persisting the "source" providers that the user passed in as part of that pin object's model?

I was under the assumption that users would simply pass these in, and the pinning service would use them to fetch the data. It makes sense to provide back the multiaddresses of the nodes that the pinning service has pinned the content to, as the user will want to reference them when retrieving the data again in the future.

However, I'm not seeing a value in making the source multiaddresses part of the object's returned model as once the content has been retrieved they serve no further use.

lidel commented 3 years ago

@obo20 well, in the current spec we persist entire Pin object and providers are part of it (can be empty tho). Not doing so complicates the spec (how to tell which fields sent in POST /pins are persisted and which ones are ephemeral?).

Use cases:

obo20 commented 3 years ago

@lidel would you be open to having unique names for the two different providers arrays?

If I as an implementer of the spec had confusion between the two, I imagine there might be a lot of confusion from those looking to work with the spec.

jacobheun commented 3 years ago

Perhaps PinStatus.providers should become PinStatus.delegates? We already use the delegate concept in IPFS, so this could be a clearer naming scheme.

lidel commented 3 years ago

would you be open to having unique names for the two different providers arrays?

Yes, I think better naming could make things more intuitive, and clarify the difference between "optional source multiaddrs" and "mandatory ones delegated by pinning service". Ideally, I would remove "provider" term from both fields, because those would not be the only providers, just a subset.

Naming things is hard.. how about optional Pin.sources (origins?) and mandatory PinStatus.delegates? (pinners?)

jacobheun commented 3 years ago

how about optional Pin.sources (origins?) and mandatory PinStatus.delegates? (pinners?)

I think Pin.providers makes sense to leave since these are "known" providers, sources makes it more confusing to me. But yes to the rest.

obo20 commented 3 years ago

I like origins it signifies to me where the content originally came from. I would vote for Pin.originNodes or Pin.origins for the source nodes. I'm less sure what to name the nodes the pinning service is running / pinning to.

lidel commented 3 years ago

Ok, two weeks later we don't have any movement here, nor alternatives other than ones already proposed in this issue.

Let's make a quick temperature check before making the final PR:

Depending on the results will open PR with the change on Thursday (or not).

Update: results: https://github.com/ipfs/pinning-services-api-spec/pull/50#issue-467397320

obo20 commented 3 years ago

@lidel one thing I want to clarify here as well. With the provided meta attribute. Is this purely intended to be utilized for custom key value attributes that users provide? Or will there ever be "custom" functionality that pinning service providers can use that field for.

I ask because I noticed in the PinStatus meta description it describes that attribute as being used for custom service provider info. I want to make sure we're not expecting the same of the values users provide for the meta attribute when pinning content.

If we want to allow custom pinning service provider functionality, I would like to suggest we keep the meta parameter as purely key values and we also create an options parameter when uploading for the custom functionality. (an example of this custom functionality would be our users providing a custom pinning policy during uploads that determines what regions their content is stored in.

lidel commented 3 years ago

Is this purely intended to be utilized for custom key value attributes that users provide? Or will there ever be "custom" functionality that pinning service providers can use that field for.

@obo20 kinda both!

Pin.meta is provided by the user and is persisted as-is with the pin object.

Pinning service might provide custom functionality based on values in Pin.meta, and we want those values to be persisted because user may want to inspect it later, or even change it via POST /pins/{id}

For example: I create pin with Pin.meta[regions] = [US] and pinning service assigns my pin to specific region based on that metadata. Some time later I want to keep pinned data in both US and EU, so I send modify request to POST /pins/{id} with updated pin object that contains Pin.meta[regions] = [US, EU], and pinning service takes care of mirroring the data between regions.

obo20 commented 3 years ago

@lidel My biggest concern here is that we're trying to do too much in one place.

While I appreciate wanting to be flexible with the spec, having users provide both "metadata" and "actionable data" in the same place with no standards seems dangerous to me.

For example, let's pretend that Pinata didn't have multiple regions functionality from the start and User A stored all of their cids with Pin.meta[regions] = [US, EU] because that's where they're storing content on their own nodes.

If we (or any other pinning service) ever added in regions functionality and decided to use the same format as User A for our instructions, all of a sudden the metadata that User A is passing in is going to get analyzed and interpreted as an instruction to pin to multiple Pinata regions while User A may not be intending for that to happen. They may have just wanted those keyvalues to be used for classifying / querying their data internally.

lidel commented 3 years ago

@obo20 I am not sold on the idea we need an overhead of a separate field for vendor-specific actionable metadata. The scenario you described could be avoided by pinata using pinata_ prefix in actionable key name (pinata_region), effectively removing risk of accidental collisions and is easier to reason about than the need to understand difference between meta[foo] and options[foo]

obo20 commented 3 years ago

@lidel do you have any pre-existing literature you're referencing for this design choice?

We modeled ours after Stripe, which is purely key/values used for querying, but nothing actionable https://stripe.com/docs/api/metadata

From reading through the AWS s3 metadata functionality it appears there's a few "System-defined object metadata" key / values that can have outside effects on how things work. The catch here is that these are all strictly controlled as amazon is the only entity working in this ecosystem and thus they have full control. https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-metadata

lidel commented 3 years ago

@obo20 the only reason for this design choice was to keep API surface as small as possible while enabling users and vendors to come up with useful conventions on top of a simple-but-flexible meta.

My worry is that if we have separate Pin.metadata and Pin.options, and both are optional maps persisted along with pin object, what will keep people from misusing Pin.options?

People will have to read docs to understand the purpose of each, and there always be someone who just use whichever they want because there is no validation. We could add distinct Pin.options if it had some kind of validation, but that requires us to hardcode specific flags, such as region, in the spec itself. I think it's too early for solidifying that type of thing, especially before Filecoin launch.

I don't believe the risk of "newly added actionable key clashing with user-defined one" is worth complicating the spec at this stage, especially when vendor_ prefix to actionable keys solves it.


Going back to your original question from https://github.com/ipfs/pinning-services-api-spec/issues/41#issue-668082164, do we want to disambiguate PinStatus.meta and Pin.meta? Or is solving *.providers in #50 enough?

obo20 commented 3 years ago

@lidel My thoughts were that the options object wouldn't be persisted. Basically I would like to have a way for users to pass in instructions that don't need to be persisted as key/values.

While our database is set up to store metadata as key / value pairs for simple querying purposes, unfortunately additional features such as regions functionality are often too complex to fit into a simple key / value paradigm and attempting to persist them as such would be problematic. For reference, here's an example "pin_policy" that a user would pass in:

        customPinPolicy: {
            regions: [
                {
                    id: 'FRA1',
                    desiredReplicationCount: 1
                },
                {
                    id: 'NYC1',
                    desiredReplicationCount: 2
                }
            ]
        }

We couldn't fit something like that into one key/value as we're storing the values as string (Which I agree with. Relational databases are not very efficient when querying JSON sub-objects)

This leaves us with a few options:

Neither of these seem like particularly elegant solutions which is why I was proposing an options object (could be named something else) that doesn't need to be persisted.


To answer your other question about the meta duplication, I think we might want to name this something else. From what I'm understanding, the Pin.meta should always be something the user sets, while the PinStatus.meta can be arbitrary information returned from the Pinning service? It might be good to make that distinction somehow. The best I can come up with is something like PinStatus.info, so I'm open to suggestions here.

lidel commented 3 years ago

Neither of these seem like particularly elegant solutions which is why I was proposing an options object (could be named something else) that doesn't need to be persisted.

I've spent some time thinking about it, and ended up with opinion that the idea that some fields won't be persisted produces undesired side effects. It makes it harder for client to inspect a pin via generic API to tell if/which any vendor-specific settings were applied to a pin. In practice, it would mean that each vendor has to provide a separate, vendor-specific API for those custom features, which defeat the purpose of generic API a bit.

We couldn't fit something like that into one key/value as we're storing the values as string

When it comes to vendor-specific features, I'd like us to go with text-based notation for now. Not only because of implementation simplicity, also security, as string-based key-value map removes surface for potential object serialization abuse.

How about something compact:

Pin.meta[pinata_policy] = "FRA1:1,NYC1:2"

or more verbose:

Pin.meta[pinata_policy_FRA1] = "replication:1"
Pin.meta[pinata_policy_NYC1] = "replication:2"

?

[..] unfortunately additional features such as regions functionality are often too complex to fit into a simple key / value paradigm and attempting to persist them as such would be problematic. [..] A big problem with this approach is querying the data. It becomes really hard/impossible to allow for more nuanced queries when things are simply stored as a string in a key/value format.

I feel like I am missing full context here, but the way I see it, pinning service receives metadata in Pin.meta as flat map of strings. You are free to persist it in any way you want, store actionable keys in separate tables, if that makes sense performance-wise etc. Pin object is effectively immutable (modification operation is only an alias for delete+create), so there is no need for synchronization, all you need to do is to ensure records are removed on Pin deletion.


I like the idea of PinStatus.info, to avoid bikeshed I've opened https://github.com/ipfs/pinning-services-api-spec/pull/51 :-)

lidel commented 3 years ago

Some notes from the call me and @obo20 (thanks!) had yesterday: