whargrove / qlog

0 stars 0 forks source link

Implement pagination with continuation tokens #6

Closed whargrove closed 8 months ago

whargrove commented 8 months ago

This PR addresses issues raised in #3 where retrieving a large number of lines from a large file would result in high CPU utilization by the server.

Previously, the only way to get all of the lines from a file was to request an arbitrarily large count to mimic a "get all" behavior. The large size of the lines to return caused significant overhead for JSON serialization.

To address the performance issues this PR:

  1. Enforces a maximum value for count at 10k entries.
  2. Provides a continuationToken to allow for consistent pagination through the entries in the log file.

With this PR the server keeps track of the line-endings seen by the file reader and returns a continuation token in the response metadata that can be sent back to the server with the next request.

The server also adds support for a start parameter as a low watermark to skip lines seen in the file until the watermark is reached and only then start collecting. The continuationToken is preferred since it is more efficient with respect to disk I/O and doesn't require reading regions of the file that would never return data.

Examples:

% time curl -Ss "localhost:8080/queryLog?relativePath=access.log&count=10000" | jq .metadata
{
  "continuationToken": {
    "token": "2005886425"
  }
}
curl 0.00s user 0.00s system 19% cpu 0.029 total
jq   0.02s user 0.00s system 74% cpu 0.029 total
time curl -Ss "localhost:8080/queryLog?relativePath=access.log&count=10000&continuationToken=2005886425" | jq .metadata
{
  "continuationToken": {
    "token": "2005886434"
  }
}
curl 0.00s user 0.01s system 19% cpu 0.030 total
jq   0.02s user 0.00s system 69% cpu 0.030 total

If the count is too large a 400 Bad Request is returned:

curl -i -Ss "localhost:8080/queryLog?relativePath=access.log&count=99999" | head -n1
HTTP/1.1 400 Bad Request

If an entire file is read in one request, the server does not return a continuationToken. Clients can use this as a signal that there are no more entries to read.

% wc -l /var/log/dmesg                                                                                                   
1556 /var/log/dmesg
% curl -Ss "localhost:8080/queryLog?relativePath=dmesg&count=1556" | jq .metadata 
null