vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.68k stars 1.56k forks source link

Vector's file source and Meta Trader 4 Server NULL-padded last log file #20602

Closed tamer-hassan closed 4 months ago

tamer-hassan commented 4 months ago

A note for the community

Problem

To sum it up briefly, Meta Trader 4 Server, a trading platform by MetaQuotes (runs on Windows), writes latest log files in quite a peculiar way: It pads the file with some Megabytes of NULL Bytes (\0) in one long line, and upon writing some log lines / messages, it replaces some NULL bytes, and appends the remaining NULL bytes to the end/last line of the log.

This effectively stumps the file source in vector, and I could not find a way around it.

Basically Vector reads up to (and sends to the sink up to) the last line before the NULLs line, then infinitely prints to stdout warnings Found line that exceeds max_line_bytes; discarding. , albeit log rate limited. However, as new lines are added by the MT4 Server before the terminating line of NULLs, Vector doesn't see them, despite the change of modification time of the file, and even the checkpoints.json file timestamp remains behind (at the time vector first seen the line of NULLs). enabling ignore_checkpoints config option doesn't make a difference, nor does setting fingerprint.strategy to "device_and_inode".

In an attempt to workaround this issue, I tried to modify the read_until_with_max_size method in lib/file-source/src/buffer.rs as follows:

    loop {
        let available: &[u8] = match reader.fill_buf() {
            Ok(n) => n,
            Err(ref e) if e.kind() == io::ErrorKind::Interrupted => continue,
            Err(e) => return Err(e),
        };

+        if available.len() > 0 && available[0] == b'\0' {
+            warn!(
+                message = "Reached line beginning with NULL.",
+                internal_log_rate_limit = true
+            );
+            buf.clear();
+            return Ok(None);
+        }

        let (done, used) = {
            match delim_finder.find(available) {
                Some(i) => {

Upon building (on windows), deleting the checkpoints.json, and rerunning vector, the same behavior as before is observed, except that the new warning I added Reached line beginning with NULL. is printed to stdout (also infinitely) instead of the warning Found line that exceeds max_line_bytes; discarding.. The checkpoints.json file is also only updated upon startup and doesn't get updated again after the first of the said warning is shown. The newly added log lines (by MT4 Server) before the terminating line of NULLs is simply not detected by Vector.

I would've expected the log file to be reopened by the FileWatcher upon return Ok(None) from read_until_with_max_size, after modification time changes, but that doesn't unforunately happen. Maybe my understanding of the file source is incorrect.

N.B. the current day log file size doesn't change due to this NULL byte padding in last line. That is, the file size remains constant 4096KB, until it is rotated, then these NULLs are removed by MT4 Server and file size reduced iff the log lines total less than 4096KB.

Restarting vector allows the file source to pick up correctly where it left off, but stops everytime after it reaches the line of NULLs, and vector would have to be again restarted for the new log lines to be seen and/or sent to the sink. P.S. I tried with Loki sink, file sink.. Same behavior with both. And as mentioned, the checkpoints.json file is not updated per the log file modification time 1 minute after startup of Vector.

Please let me know if any further info is needed. Any ideas/pointers on how to solve this would be much appreciated.

Configuration

No response

Version

v0.38.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

very similar/related: https://github.com/vectordotdev/vector/issues/19628

tamer-hassan commented 4 months ago

I think I understand better now having had more time to look at the code, especially up the chain from buffer, to file_watcher and file_server. Moreover, the following comment by @lukesteensen confirms my understanding: https://github.com/vectordotdev/vector/issues/3567#issuecomment-1687133039

Yeah, I think the best solution is likely to introduce another state to the file watcher where we still checkpoint the EOF but don't hold an active file handle. Right now it's basically all or nothing: we're either actively watching with an open handle or we ignore it entirely via exclude. Some kind of passive watching state for old/idle files would allow us to retain checkpoints in case the file receives future writes, but not take up a file handle and time polling for reads.

jszwedko commented 4 months ago

Hi @tamer-hassan !

This seems like an odd enough file format that I'm not sure if it makes sense to have the file source handle it natively since I don't see an easy way to do so that wouldn't add a bunch of complexity just to support this niche case. I'd probably recommend using the exec source to run a custom program to read from this file.

tamer-hassan commented 4 months ago

Hi @jszwedko Thanks for the quick reply. I understand that it is not that common of a case, and it's not a big deal for me if introducing an option to ignore null terminating line is not mainlined. I can always maintain a fork for this particular purpose, with minor modification as I demonstrated in my first post. The bigger problem here lies in the fact that the file handle is never released so the file is never reopened, and so newer appended data is never seen. Check-pointing, however, works perfectly fine.

Consequently, I am much more interested in seeing #3567 resolved, especially if in a way like @lukesteensen suggested (and as quoted above) ; passive watching state. Example: https://github.com/notify-rs/notify/blob/main/examples/async_monitor.rs I've commented there too and hoping for feedback, or better, seeing progress soon in that regard. Too many open file handles for old/idle/stale files is an issue for which a solution would be of much more interest to many users of vector. Feel free to close this issue here if you see fit. Cheers :)

jszwedko commented 4 months ago

Thanks @tamer-hassan ! That makes sense. Vector's current behavior is to just maintain a pointer to the end of the file, with the file open, to wait for new daa. The comment you mentioned, https://github.com/vectordotdev/vector/issues/3567#issuecomment-1687133039, would be an improvement to that behavior. I'll close this issue in-lieu of that one. I appreciate the understanding about supporting this particular case!

tamer-hassan commented 4 months ago

For posterity, I got Vector file source to work as I needed for the purpose of this issue, with minimal changes for now: https://github.com/tamer-hassan/vector/commit/518d4e17db2a698491cc3927df39de676d7ef523

Basically the change makes vector not retain file handle when reaching NUL line, effectively allowing file to be reopened by watcher and re-read from last position (thanks to vector checkpointer)

Since this is on windows (running as a service), and I am not logging console , I am not concerned about the repeated log every second about file being resumed (for the current log which ends with NULL lines).