Open codeboten opened 2 months ago
Pinging code owners:
pkg/stanza/fileconsumer: @djaglowski
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I think that the issue might lie on the sink's buffer size at https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/67e92938376e5dc6c200899f7545c5cb7aec95e0/pkg/stanza/fileconsumer/file_test.go#L351. I was not able to reproduce it as is locally (on linux) but by decreasing the buffer size a little bit I saw that failing. However, I'm not sure how accurately this can be verified 🤔 Maybe just increasing the size of the buffer and see if flakiness disappear might do the work here:
diff --git a/pkg/stanza/fileconsumer/file_test.go b/pkg/stanza/fileconsumer/file_test.go
index 97e6fe8415..e2e8dd7406 100644
--- a/pkg/stanza/fileconsumer/file_test.go
+++ b/pkg/stanza/fileconsumer/file_test.go
@@ -348,7 +348,8 @@ func TestSymlinkedFiles(t *testing.T) {
cfg := NewConfig().includeDir(targetTempDir)
cfg.StartAt = "beginning"
cfg.PollInterval = pollInterval
- sink := emittest.NewSink(emittest.WithCallBuffer(numFiles * logLinesPerFile))
+ sink := emittest.NewSink(emittest.WithCallBuffer(numFiles * logLinesPerFile * 1.5))
operator := testManagerWithSink(t, cfg, sink)
require.NoError(t, operator.Start(testutil.NewUnscopedMockPersister()))
@djaglowski any other detail that might has slipped here?
I'm able to reproduce it locally on mac M1.
The sink buffer size should be fine as it is large enough to contain all 3000 lines. I tested increasing it in case and it still fails.
One thing I see is that the comment // The sleep time here must be larger than the poll_interval value
does not appear to be followed correctly. Still trying to understand how the test was meant to work but increasing the sleep time does seem to fix it.
One thing I see is that the comment // The sleep time here must be larger than the poll_interval value does not appear to be followed correctly. Still trying to understand how the test was meant to work but increasing the sleep time does seem to fix it.
The idea of the test (originally) is that the symlinks switch fast while the consumer still achieves consuming all of their contents.
The sleep time must be enough to give time to the consumer to get to the next poll, match the file, and consume it before the symlink has switched.
The sleep time is pollInterval + 1*time.Millisecond
so it's actually bigger than the poll interval by 1ms, no?
Yeah, you're right. I missed that it's a computed value.
Anyways, we seem to be dropping logs regularly when the symlinks are rotated. I am still looking into this.
I have not made progress on this issue yet but it's causing problems on main so I've opened #34569 to temporarily skip the test in the meantime.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Component(s)
pkg/stanza/fileconsumer
Describe the issue you're reporting
See https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/10290446477/job/28480438927 for more details: