Closed lamr02n closed 1 month ago
Well, technically, not. There's an error in thinking here. When the application starts, the sliding window should start and stop when we reach 1000 lines or after the timeout. Technically, we could take the beginning timestamp of the first message; however, it would not represent the window.
There's an error in thinking here. When the application starts, the sliding window should start and stop when we reach 1000 lines or after the timeout. Technically, we could take the beginning timestamp of the first message; however, it would not represent the window.
So, we could revert to the first implementation? Before we used the timestamps from the data, we stored our timestamps based on the times the messages entered or left the pipeline. I still see the problem that we're not using the timestamp of the window in which the log lines were recorded, but the timestamps from execution of our algorithm. The timestamps in the log lines might be from another time (e.g. could originate from a file).
So, we could revert to the first implementation?
We could elaborate on in the next meeting.
I still see the problem that we're not using the timestamp of the window in which the log lines were recorded, but the timestamps from execution of our algorithm.
When we find something, we want to output all information to the user, especially the timestamps. Maybe in the future, we will add databases for profiling/monitoring, where timestamps come into play.
The timestamps in the log lines might be from another time (e.g. could originate from a file).
Yes, we could run into the problem that reading data from an old file. I guess we should therefore only consider new lines added, instead of reading the whole file.
The proposed change creates a problem with the data in later stages. Currently, the begin_timestamp
and end_timestamp
are extracted before filtering, therefore ensuring correct timestamps. If we moved the extraction to a later point (essentially after filtering), we might throw away relevant data points. We will leave the implementation as it is, since propagating the two timestamps through the pipeline adds virtually no overhead.
Currently, the extraction of the
begin_timestamp
andend_timestamp
is done by theBatch Sender
. Since we updated the way we extract the timestamps, we could move this step to a later stage (for example the Inspector, in which the timestamps are needed). This could reduce message size because we would not need to send them as metadata.