pulumi / pulumi

Pulumi - Infrastructure as Code in any programming language 🚀
https://www.pulumi.com
Apache License 2.0
21.84k stars 1.12k forks source link

Support paging in logs #739

Open lukehoban opened 6 years ago

lukehoban commented 6 years ago

Currently, logs supports startTime and endTime for controlling the range of logs to return. But within any time range, it's possible for too many logs to be returned, such that we cannot return them all at once.

The operations provider interface for GetLogs should ideally support paging, with requests taking a token representing where to resume, and responses that have more results pending returning a token representing how to ask for more.

The problem though is that our logging infrastructure has the following characteristics:

  1. We aggregate logs from many sources
  2. We don't collect and store these aggregated logs ourselves, we delegate down to log sources on all requests for logs
  3. We want to present logs back to users in time order across all sources

This makes paging hard.

First, each of the sources may page independently, so the token we use for paging somehow needs to include the state of all paging tokens from all sources (along with information about how to correlate them back to the source).

Second, we can't do a sort across returned logs until we have all the logs - or all the pages. So we can't actually return back to users a page at a time just directly based on the pages from the sources.

Paging on the operations provider interface, even if it doesn't guarantee global ordering, may still be useful so that we can run the operations provider safely behind a RESTful interface (which we will do in the PPC/service). Front-ends that presented logs to users in the CLI and console would then be responsible for collecting pages and sorting them together. (And potentially, if they can reorder logs in the output, like the web console can, deciding they want to lazily load in pages even though order will not be guaranteed - though this is unlikely to be a great experience).

We need to do the following:

  1. Figure out how good of a job we can do on this with the current design, and implement that in the near term.
  2. Look into what it would mean to try and centrally collect all logs - at least in the cloud case. This seems very challenging in a world where we support any kinds of AWS resources, and it's not clear this is a direction we should be going - but it may be the only way to offer some forms of experience we want here.
pgavlin commented 6 years ago

FWIW, the lack of response paging recently bit us when the /logs endpoint on a PPC was accessed. This endpoint promptly attempted to return all logs for the indicated stack, during which it exceeded its memory quota, which resulted in heavy paging to its EC2 instance's swap device and caused other endpoints served by the same instance to become unavailable.

joeduffy commented 6 years ago

I'm pulling into M9.2, in case there is a quick fix for this. I realize more feature-rich log paging may take longer to implement, and I still dream of an infiniscroll view closer to what Travis offers.

ellismg commented 5 years ago

Given that we only run GetLogs locally now, I think this is less of an issue. I'm removing the milestone for now. If folks end up using pulumi logs heavily in environments where we thinking paging would be beneficial, even when running locally, we can figure out how to tackle this.