sematic-ai / sematic

An open-source ML pipeline development platform
Other
972 stars 59 forks source link

More responsive log reading #1109

Closed augray closed 11 months ago

augray commented 11 months ago

We write log chunks to cloud storage every ~10 seconds, but we SKIP writing a log file if there were no log lines produced in that time period. When we request logs from the dashboard, we request them with a limit for the number of lines (2000). We will traverse the files in storage until we get to 2000 lines, so long as there are more files available for traversal. If your logging is dense, you will get to 2k lines quickly. If your logging is sparse in the sense that you only get messages every few minutes, but when those messages do show up there are at least several lines, then you get to 2k lines fairly quickly. The worst case scenario for how many remote log files you must traverse is if your logging produces a small number of lines every several seconds (absolute worst case would be one line every 10 seconds).

This PR adds a time limit such that even if 2000 lines haven't been found yet, and there are more files to check, we will still stop after 10 seconds of searching and return whatever we have. However, in order to make sure the cursor makes some progress with each call, and the user gets something if possible, we won't do an early return if NO lines have yet been found.

Testing

Made a run where the "log flush" interval was only 1 second, with a single log line produced every second. This ensures the kind of "worst case" sparsity this PR is targeting. Run is here.

Tried the following with that run:

Note that it is still possible to get the search request to time out if you use a filter for something that is incredibly rare (or nonexistent) and traversing all the existing log files takes more time than the request is allowed. However, this is probably a somewhat acceptable failure mode (at any rate, solving it would probably require that we create a search index for the logs or some other such highly complex solution). If you don't use any filtering, you should always get back at least one line or be told (truthfully) that there are no lines yet.