how should we paginate search results?

dhalperi commented 9 years ago

FYI: @domoritz @stechu

Right now we paginate the queries using N and P -- N is the ID of the last query, and P is the number of results per page. We can figure out the query ID range of page i by [N-P*(i-1), N-P*i).

However, this will not work for filtered results, e.g., "only queries with twitter in the program". What should we do instead?

not support pagination -- only return the first P results with no customization
only support next -- where we look for the next P queries starting at max = min(current result set query id)-1 but no back or jumping.

use SQL to actually give us the [i*P, (i-1)*P) results. E.g., something like

SELECT * FROM
  (SELECT * FROM R
   WHERE <filter>
   ORDER BY query_id DESC
   LIMIT <page*P>) first_page_pages
ORDER BY query_id ASC
LIMIT <P>

Opinions?

domoritz commented 9 years ago

Yeah, this was a pain.

Can't we add a row number?

SELECT 
  ( SELECT COUNT(*) + 1 
    FROM "queries" 
    WHERE id < t.id AND [condition]
  ) as rowIndex, 
  query, [columns]
FROM "queries"
WHERE [condition]
ORDER BY id;

dhalperi commented 9 years ago

whew, I think that nested subquery has bad complexity... effectively O(n^2)?

Daniel Halperin Director of Research for Scalable Data Analytics eScience Institute University of Washington

On Tue, Dec 9, 2014 at 11:02 AM, Dominik Moritz notifications@github.com wrote:

Yeah, this was a pain https://github.com/uwescience/myria-web/blob/master/appengine/myria_web_main.py#L326.

Can't we add a row number?

SELECT ( SELECT COUNT(*) + 1 FROM "queries" WHERE id < t.id AND [condition] ) as rowIndex, query, [columns]FROM "queries"WHERE [condition]ORDER BY id;

— Reply to this email directly or view it on GitHub https://github.com/uwescience/myria-web/issues/232#issuecomment-66337460 .

domoritz commented 9 years ago

Or add pagination information to the json response... Something like

{
  "max": 42,
  "limit": 10,
  "numQueries": 154,
  "queries": [
    ...
  ]
}

dhalperi commented 9 years ago

@domoritz the key point is that given that information I still can't jump to an arbitrary page. From those numbers, I'm on at least the 11th page (there are 16 pages total with 154 results and 10/page), and there are at most 42 queries "after" the current max. How do I know what query ID starts page 10? I know it's some number >= 52, but I can't tell any more than that. FWICT.

domoritz commented 9 years ago

Max in this case is the result, not the query id. Which is not great because the page may change if new queries come in.

Btw, for the previous suggestion, see github API:

> GET /search/code?q=addClass+user:mozilla HTTP/1.1
> User-Agent: curl/7.37.1
> Host: api.github.com
> Accept: */*
>
< HTTP/1.1 200 OK
* Server GitHub.com is not blacklisted
< Server: GitHub.com
< Date: Tue, 09 Dec 2014 19:10:48 GMT
< Content-Type: application/json; charset=utf-8
< Status: 200 OK
< X-RateLimit-Limit: 10
< X-RateLimit-Remaining: 8
< X-RateLimit-Reset: 1418152288
< Cache-Control: no-cache
< X-GitHub-Media-Type: github.v3
< Link: <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2>; rel="next", <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>; rel="last"
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: deny
< Content-Security-Policy: default-src 'none'
< Content-Length: 157484
< Access-Control-Allow-Credentials: true
< Access-Control-Expose-Headers: ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval
< Access-Control-Allow-Origin: *
< X-GitHub-Request-Id: 80D00769:6714:1402960:54874937
< Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
< X-Content-Type-Options: nosniff
< Vary: Accept-Encoding
< X-Served-By: 474556b853193c38f1b14328ce2d1b7d
<
{
  "total_count": 1970,
  "incomplete_results": false,
  "items": [
    {}
  ]
}

Okay, I'm fine with no jumping for search results. Just forward and backwards.

domoritz commented 9 years ago

What about backwards pagination? This way the results for a page stay the same. However, it's impossible to jump to a page with a specific query and see what happened before and after it.

I'm still for pagination without jumping for search results.

dhalperi commented 9 years ago

Looking at GitHub, it looks like they use page=P to indicate which page of the results you get, which would have the results-changing issue. Not a huge deal, but I get it.

Backwards pagination: harder to compute, but doable! I'll think about it.

dhalperi commented 9 years ago

I only know how to do forwards (earlier queries) pagination with search.

Consider a hypothetical result set (just the query ids): for this query: ?q=twitter&limit=10&max=1000

[997, 944, 10, 9, 8, 7, 6, 5, 4, 3]

The forwards pagination is: ?q=twitter&limit=10&max=2 because 2 is the smallest query id in the result set (3) minus 1.

The backwards pagination is not computable here. So we could have a next arrow, but the back button would be a browser-level back?

dhalperi commented 9 years ago

See #237 for the resolution.

uwescience / myria-web

how should we paginate search results? #232