Closed tisto closed 1 year ago
-> you can see that the results are actually pretty close to each other, at least when we hammer the instance a bit. -> with less load, you can see that "fullobjects=1" takes double the time than the request without, when doing lots of request that does not matter that much any longer
-> you can see that even for the exact same request (in this case, content object with fullobjects=1) the response times are pretty widespread, this looks like a load problem
cc @jensens @cekk @sneridagh
See linked PR at ZCatalog above here.
@jensens on our way back home from Sorrento, @jackahl and I came up with an idea of how to solve the querystring-search caching issue. We could, as discussed, use a GET request instead of POST and use subtraversal for a unique URL. Though, instead of either encoding the complex query in the URL as GET params (which has lots of possible side effects and the character limit), we could just send the query in the body and generate a unique hash in the URL, to make it unique.
I outlined the idea above:
https://github.com/plone/plone.restapi/issues/1252#issue-1037098395
And I also started to play around with plone.rest, to see if that works:
https://github.com/plone/plone.rest/pull/125
Though, the main question would be if we get away with bending the HTTP standard:
https://stackoverflow.com/questions/978061/http-get-with-request-body?answertab=votes#tab-top
and if Varnish/Cloudflare or any other system (Python requests lib, curl, etc. seem to work) just works or if we run into issues.
The other option would be to send the encoded querystring in the URL as subtraversal, with the problem that it is pretty opaque for devs and that it can run into the 2000 characters limit:
https://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers
@jackahl gave me the idea for the solution we came up with when he proposed to send the "clean" query within the HTTP body, just for reference and dev convenience.
I'd love to hear what you think! :)
cc @sneridagh @ericof
@lukasgraf @buchi did you ever had the need to cache POST requests in plone.restapi?
My overall reaction: Great idea, using a hash and sub-traversal .... but is this within the spec?
Specification wise (HTTP 1.1) in RFC 7231 https://www.rfc-editor.org/rfc/rfc7231#page-25
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
When we implement it this way we have no guarantee if webserver, caches, web application firewalls, and such operating between the client and the api-server are working correctly.
My conclusion: No, better not. We open pandoras box here.
I would propose to check the length of the encoded payload, and if the length exceeds the limit: go with a POST. According to the last URL in the previous post here, this is for a JS-generated request something around 2k is save, which is a lot for a query. Browser detection and increasing the limit based on this could be an option.
Inspired by your idea to use the body, we could use the Headers to send the payload (combined with a hash). But this feels somehow wrong and hackish as well, even if we would kind of stay within the specs.
Overall this is not that satisfying
@jensens I share your doubts and I agree that this could cause problems. Though, Elastic Search is doing exactly this (GET with a query in the payload):
Therefore it might be worth a shot. If we would run into issues, this would be a deal-breaker of course.
Making the frontend decide to use either GET or POST could be a valid way to go. Though we still have to fix the nested querystring issue and this would mean that long queries can not be cached at all.
I thought about using headers as well. Though, as you said, this feels hacky and there is also a size limit on HTTP headers. The body is the only size-safe way.
Just a follow up: Solr also uses GET for search requests and it seems that GET is even the default here:
https://solr.apache.org/guide/8_10/json-request-api.html https://dzone.com/articles/solr-select-query-get-vs-post
@sneridagh @tiberiuichim @plone/volto-team my main challenge right now it how to properly convert a nested dict with an array to a proper querystring. In Volto we are using the "query-string" lib which intentionally does not support nesting:
https://github.com/sindresorhus/query-string#nesting
After digging into this rabbit whole I can fully understand this decision. Though, this would mean that on Volto (or any other restapi request) needs to "stringify" an object within the request.
In JS this would look like this:
const queryString = require('query-string');
queryString.stringify({
foo: 'bar',
nested: JSON.stringify({
unicorn: 'cake'
})
});
//=> 'foo=bar&nested=%7B%22unicorn%22%3A%22cake%22%7D'
In Python like this:
payload = {
"query": json.dumps(
[
{
"i": "Title",
"o": "plone.app.querystring.operation.string.is",
"v": "Welcome to Plone",
},
{
"i": "path",
"o": "plone.app.querystring.operation.string.path",
"v": "/news",
},
]
),
"sort_on": "sortable_title",
"sort_order": "reverse",
"limit": "10",
"fullobjects": "False",
"b_start": "0",
"b_size": "2",
}
response = requests.get(
self.document.absolute_url(),
headers={"Accept": "application/json"},
params=payload,
auth=(SITE_OWNER_NAME, SITE_OWNER_PASSWORD),
)
(link to my playground in plone.rest: https://github.com/plone/plone.rest/pull/125/files)
Though, there are other libraries (like "qs") which seem to support nested querystrings:
> var qs = require('qs');
undefined
> query = {
... "query": [
... {
..... "i": "Title",
..... "o": "plone.app.querystring.operation.string.is",
..... "v": "Welcome to Plone",
..... },
... {
..... "i": "path",
..... "o": "plone.app.querystring.operation.string.path",
..... "v": "/news",
..... },
... ],
... "sort_on": "sortable_title",
... "sort_order": "reverse",
... "limit": "10",
... "fullobjects": "False",
... "b_start": "0",
... "b_size": "2",
... }
{
query: [
{
i: 'Title',
o: 'plone.app.querystring.operation.string.is',
v: 'Welcome to Plone'
},
{
i: 'path',
o: 'plone.app.querystring.operation.string.path',
v: '/news'
}
],
sort_on: 'sortable_title',
sort_order: 'reverse',
limit: '10',
fullobjects: 'False',
b_start: '0',
b_size: '2'
}
> var str = qs.stringify(query)
undefined
> console.log(str)
query%5B0%5D%5Bi%5D=Title&query%5B0%5D%5Bo%5D=plone.app.querystring.operation.string.is&query%5B0%5D%5Bv%5D=Welcome%20to%20Plone&query%5B1%5D%5Bi%5D=path&query%5B1%5D%5Bo%5D=plone.app.querystring.operation.string.path&query%5B1%5D%5Bv%5D=%2Fnews&sort_on=sortable_title&sort_order=reverse&limit=10&fullobjects=False&b_start=0&b_size=2
var out = qs.parse(str)
undefined
> console.log(out)
{
query: [
{
i: 'Title',
o: 'plone.app.querystring.operation.string.is',
v: 'Welcome to Plone'
},
{
i: 'path',
o: 'plone.app.querystring.operation.string.path',
v: '/news'
}
],
sort_on: 'sortable_title',
sort_order: 'reverse',
limit: '10',
fullobjects: 'False',
b_start: '0',
b_size: '2'
}
See https://github.com/ljharb/qs#parsing-objects for details.
Let me know what you think.
At first sight, I guess it's better to use something that supports it, so no conversion whatsoever needed (in either side).
Fact: qs
: 6k stars,query-string
: 5k. Also, Sindre Sorhus is known to be quite opinionated.
If we choose to change to qs
, how about the Python parser? Does it detects the nesting well?
Back to work after a day off and thinking further.
@jensens +1 to all.
Side info, but maybe good to know: we use 'paqo' in the serialized p.a.querystring query for the URL, in search block: https://github.com/plone/volto/blob/master/src/components/manage/Blocks/Search/hocs/withSearch.jsx#L177
The HTTP spec doesn't say you can't send data with a GET request but it all depends on the frameworks used if it is supported or not. Another issue with this approach is that the requests can't be cached. If you use query string parameters you are able to cache the requests. There are however things to take in to account when using the query string for this;
JSON is not supported by default as a query string. There is also no "global" standard on how to do this. There are multiple methods of encoding a JSON object to a string, let's take the following query:
query: [
{
i: 'Title',
o: 'plone.app.querystring.operation.string.is',
v: 'Welcome to Plone'
},
{
i: 'path',
o: 'plone.app.querystring.operation.string.path',
v: '/news'
}
],
sort_on: 'sortable_title',
sort_order: 'reverse',
limit: '10',
fullobjects: 'False',
b_start: '0',
b_size: '2'
};
Which results in the following transforms:
Encode URI: %7B%22query%22%3A%5B%7B%22i%22%3A%22Title%22%2C%22o%22%3A%22plone.app.querystring.operation.string.is%22%2C%22v%22%3A%22Welcome%20to%20Plone%22%7D%2C%7B%22i%22%3A%22path%22%2C%22o%22%3A%22plone.app.querystring.operation.string.path%22%2C%22v%22%3A%22%2Fnews%22%7D%5D%2C%22sort_on%22%3A%22sortable_title%22%2C%22sort_order%22%3A%22reverse%22%2C%22limit%22%3A%2210%22%2C%22fullobjects%22%3A%22False%22%2C%22b_start%22%3A%220%22%2C%22b_size%22%3A%222%22%7D (453)
Rison: (b_size:'2',b_start:'0',fullobjects:False,limit:'10',query:!((i:Title,o:plone.app.querystring.operation.string.is,v:'Welcome to Plone'),(i:path,o:plone.app.querystring.operation.string.path,v:/news)),sort_on:sortable_title,sort_order:reverse) (242)
O-Rison: b_size:'2',b_start:'0',fullobjects:False,limit:'10',query:!((i:Title,o:plone.app.querystring.operation.string.is,v:'Welcome to Plone'),(i:path,o:plone.app.querystring.operation.string.path,v:/news)),sort_on:sortable_title,sort_order:reverse (240)
JSURL: ~(query~(~(i~'Title~o~'plone.app.querystring.operation.string.is~v~'Welcome*20to*20Plone)~(i~'path~o~'plone.app.querystring.operation.string.path~v~'*2fnews))~sort_on~'sortable_title~sort_order~'reverse~limit~'10~fullobjects~'False~b_start~'0~b_size~'2) (253)
QS: query[0][i]=Title&query[0][o]=plone.app.querystring.operation.string.is&query[0][v]=Welcome to Plone&query[1][i]=path&query[1][o]=plone.app.querystring.operation.string.path&query[1][v]=/news&sort_on=sortable_title&sort_order=reverse&limit=10&fullobjects=False&b_start=0&b_size=2 (279)
URLON: $query@$i=Title&o=plone.app.querystring.operation.string.is&v=Welcome%20to%20Plone;&$i=path&o=plone.app.querystring.operation.string.path&v=//news;;&sort_on=sortable_title&sort_order=reverse&limit=10&fullobjects=False&b_start=0&b_size=2 (236)
The encode uri method is in my opinion the safest and most straight forward implementation and shouldn't be hard to implement both on the frontend and the backend.
Altough the HTTP spec doesn't limit the length of a querystring all browsers do. Currenty the following limits are in place:
Looking at the above examples even the largest (url encoding) is "only" 453 characters so even if you want to support Internet Explorer almost all requests should work.
LGTM, let's discuss this.
Also, FYI, I just remembered that this PR is in place: https://github.com/plone/volto/pull/4159
No clue how they are encoding the state in the URL. I guess we should then sync before merging.
@robgietema thanks for the write-up. We have to take into account the F5 Big IP component:
https://my.f5.com/manage/s/article/K60450445 (default is 2048 bytes)
We could think about shorting strings like plone.app.querystring.operation.string.is
.
@tisto Should we consider this done?
Possible Solutions
Use GET instead of POST
Problems to solve