ipfs / pinning-services-api-spec

Standalone, vendor-agnostic Pinning Service API for IPFS ecosystem
https://ipfs.github.io/pinning-services-api-spec/
Creative Commons Zero v1.0 Universal
100 stars 27 forks source link

refactor: use exploded array parameters for 'cid' and 'status' #69

Closed lidel closed 3 years ago

lidel commented 3 years ago

:point_right: This PR aims to facilitate discussion: we need to decide if we want to merge this or close and leave it as-is.

This PR introduces (most likely) a BREAKING CHANGE: cid and status filters need to be passed differently:

PREVIEW: https://ipfs.github.io/pinning-services-api-spec/#specUrl=https://raw.githubusercontent.com/ipfs/pinning-services-api-spec/fix/explode-array-parameters/ipfs-pinning-service.yaml


TLDR

Right now the Pinning Service API uses ?cid=Qm1,Qm2 notation for cid and status parameters.

This may not be the best choice because of:

  1. url-escaping , into %2C
  2. future CID encodings or new filters that could include , inside passed value

Potential solution is to switch to exploded notation: ?cid=Qm1&cid=Qm2 (this PR)

Background

IIRC nobody had any strong opinions at the time, so we picked a notation that is short and human-readable.

While working on pin remote support in go-ipfs (https://github.com/ipfs/go-ipfs/pull/7661) we realized additional nuance: clients generated from the v0.1.2 spec will automatically escape commas in the query parameter, which weakens the "human-readable" argument a bit:

GET /pins?status=queued%2Cpinned&cid=QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn%2CQmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR&limit=10

This is because:

RFC 3986 defines a set of reserved characters :/?#[]@!$&'()*+,;= that are used as URI component delimiters. When these characters need to be used literally in a query parameter value, they are usually percent-encoded.
If you want a query parameter that is not percent-encoded, add allowReserved: true to the parameter definition (src)

Ref. https://swagger.io/docs/specification/serialization/#query

Why this is a problem

While status is an enum of four predefined states (queues|pinning|pinned|failed), the cid is an array of opaque strings. Most of CIDs today are encoded in Base58btc or base-insensitive Base32/36, however it is possible that in the future IPFS will add support for base encoding that includes , in its alphabet, which introduces some ambiguity.

In practice, it is not a big deal:

However:

OpenAPI supports many notations, and exploded one removes ambiguity by replacing commas with multiple query params:

2020-11-25--15-39-01

We ned to make a decision

Should we:

or is it better to

Thoughts?

Note that I don't feel strongly about this, but failed to find any documented decision why we went with commas. Think about his PR as a good opportunity to formalize the decision before we freeze the API.

Please comment if you don't think making the change is worth it, or if you see technical argument against exploded notation (lack of support in frameworks, libraries, etc).

@aschmahmann @achingbrain @jacobheun @petar @gammazero @obo20 @andrew @GregTheGreek @priom @jsign @sanderpick @andrewxhill @ipfs/wg-pinning-services @Gozala

achingbrain commented 3 years ago

I have a preference for multiple params, it's harder to misinterpret and doesn't require any url escaping.

andrew commented 3 years ago

The url query parser built into ruby on rails doesn't handle multiple params of the same name, returning only the last one instead of an array of all of them, example:

Started GET "/test?cid=Qm1&cid=Qm2" for ::1 at 2020-11-25 15:53:49 +0000
Processing by CidsController#test as HTML
  Parameters: {"cid"=>"Qm2"}
{"cid"=>"Qm2", "controller"=>"cids", "action"=>"test"}
Completed 200 OK in 1ms

I suspect all ruby web frameworks that use Rack will behave very similar, this would be very hacky for me to implement this change.

The supported method for multiple values for a single query param in rack/rails is to add [] to the name of the param, example:

Started GET "/test?cid[]=Qm1&cid[]=Qm2" for ::1 at 2020-11-25 15:57:54 +0000
Processing by CidsController#test as HTML
  Parameters: {"cid"=>["Qm1", "Qm2"]}
{"cid"=>["Qm1", "Qm2"], "controller"=>"cids", "action"=>"test"}
Completed 200 OK in 1ms
Gozala commented 3 years ago

I have a preference for multiple params, it's harder to misinterpret and doesn't require any url escaping.

I second that

andrew commented 3 years ago

Someone has done some research for this problem in php, seems like the default is similar to rails: https://stackoverflow.com/a/9547490/120434

jacobheun commented 3 years ago

I really don't think this change is worth it, and there's not a meaningful justification IMO to warrant breaking the existing spec.

Each of these are viable options and we can argue about their various merits. As they are all valid param permutations, I'd recommend we leave as is. The main downside to the existing approach is url escaping and legibility, but we also get to guarantee order. Neither /test?cid[]=Qm1&cid[]=Qm2 nor /test?cid=Qm1&cid=Qm2 give us this.

The other benefit of just using comma delimited is that we don't have to deal with special circumstances of various API frameworks. Just read the key/value and handle the url decoded split.

future CID encodings or new filters that could include , inside passed value

With url encoding we should just avoid future CID encodings that use this at all costs, this sounds like a nightmare to deal with.

petar commented 3 years ago

+1 to Jacob's comment. I think we have a lot of important feature work ahead of us, so we should invest our time in issues that are a clear "must do".

On Wed, 25 Nov 2020 at 08:19, Jacob Heun notifications@github.com wrote:

I really don't think this change is worth it, and there's not a meaningful justification IMO to warrant breaking the existing spec.

Each of these are viable options and we can argue about their various merits. As they are all valid param permutations, I'd recommend we leave as is. The main downside to the existing approach is url escaping and legibility, but we also get to guarantee order. Neither /test?cid[]=Qm1&cid[]=Qm2 nor /test?cid=Qm1&cid=Qm2 give us this.

The other benefit of just using comma delimited is that we don't have to deal with special circumstances of various API frameworks. Just read the key/value and handle the url decoded split.

future CID encodings or new filters that could include , inside passed value

With url encoding we should just avoid future CID encodings that use this at all costs, this sounds like a nightmare to deal with.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ipfs/pinning-services-api-spec/pull/69#issuecomment-733807405, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACFTS42WOWRXOEJL5ARRDLSRUVB3ANCNFSM4UCRYAXA .

obo20 commented 3 years ago

I agree with @jacobheun on this one. Unless we have a very urgent reason, any breaking changes don't seem to be worth it. Especially considering some have already expressed potential difficulties with getting this change to work with their language of choice.

lidel commented 3 years ago

Thank you all for providing feedback.

For the record: the consensus is that the change is not worth making, given both time and interop constraints (comma-separated is slightly easier, as it can be wired up even in heavily-opinionated environments).

I'm closing this without merging.