vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.42k stars 1.51k forks source link

device_and_inode strategy doesn't handle inodes reuse #18342

Closed Hexta closed 1 year ago

Hexta commented 1 year ago

A note for the community

Problem

device_and_inode fingerprint strategy doesn't handle inodes reusing during log files rotation. It leads to logs losing.

Example:

include = [ "/var/log/foo.log" ]
Before rotation. File inode
/var/log/foo.log 1

After the 1st rotation File inode
/var/log/foo.log 2
/var/log/foo.log.0 1

After the 2nd rotation File inode
/var/log/foo.log 1
/var/log/foo.log.0 2

Vector resumes watching /var/log/foo.log from the last checkpointed position.

Configuration

No response

Version

0.33.0-custom-61c0ae8a5

Debug Output

No response

Example Data

No response

Additional Context

No response

References

2163

jszwedko commented 1 year ago

Hi @Hexta !

This is a fundamental deficiency of this checkpoint strategy vs. the fingerprint strategy. I'm not really sure how we would work around it in Vector. Do you have ideas of how this might be tracked in a way to detect inode reuse?

Hexta commented 1 year ago

Hi! From the top of my head. fanotify could be used for subscribing to FS events and receive file_handle structs, which could be passed to open_by_handle_at() function for opening the original file. open_by_handle_at() will fail If the file was removed and a new file created with the same inode between receiving file_handle and calling the function. Another benefit — we don't need to continuously scan for new/removed files. Just poll a file descriptor returned by fanotify_init. So it's good for scalability when there are a lot of input files.

But I'm not sure if it's worth implementing )

IMO it'd be great:

jszwedko commented 1 year ago

Thanks for the thoughts @Hexta ! We don't currently leverage fanotify anywhere but I think what you suggested seems plausible. If we do decide to add fanotify we'll probably rethink much of the current implementation which currently relies only on scanning.

Agreed with adding notes to the docs about this. I'll do that.

jszwedko commented 1 year ago

Actually, I see this is already documented here: https://vector.dev/docs/reference/configuration/sources/file/#fingerprinting

Given that, I'll close this out, but thank you for the discussion!