Closed BharatKJain closed 1 month ago
Attention: Patch coverage is 97.54098%
with 3 lines
in your changes missing coverage. Please review.
Project coverage is 91.34%. Comparing base (
f1f2b7f
) to head (116a48f
). Report is 8 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
...mor-interceptor/src/postprocessor/gelf_chunking.rs | 97.54% | 3 Missing :warning: |
Okay, will do.
Trying to think-out-loud, so let's say if we create a hash of a message but repetitive logs will create same message-id which is a problem because message-id has to be unique in nature, collisions can cause problems when we're decoding the message on the server side. I am not completely sure but ideally we would want to have uniqueness in the message-id to make sure that we are not breaking the server GELF-decoding.
(How it will break server-side due to collision? So message-id is a way of determining if the UDP packet is associated with already existing log or it's for a new log, when we're sending same message-id for multiple logs then server behaviour will be to merge the data-together which will end-up breaking the log)
TBH I am also trying to figure this out, please share any suggestions, am I thinking right? 😅
Ja just the message content would not work, I'm still considering if message content + ingest_ns (nanosecond when the message was registered at tremor) would be enough, if a server produces the same log twice in the same nanosecond that'd be very odd (but not impossible) OTOH having two random generated numbers be the same is also odd (but not impossible) it would also one a more deterministic failure case "When messages with the same content arrive at exactly the same time they will get duplicated message ids" instead of "if the RNG hates you, you'll get duplicated message ids"
Sorry for all the forth and back, so I've been thinking about a good way to make this:
1) fast 2) deterministic 3) non-coliding
and wanted to throw out a suggestion using the ingest_ns + an incremental id as a message ID. This is:
1) fast since we just have to look at two integers, no RNG, no hashing 2) deterministic since the ingest_ns is settable, and the incremental counter is well, incremental so determinstic as well 3) it avoids duplicates by not creating duplicates on the same system due to incremtal id's and makes them extremely unlikely on multiple systems due to nano second timestamps involved.
Sorry for all the forth and back, so I've been thinking about a good way to make this:
1. fast 2. deterministic 3. non-coliding
and wanted to throw out a suggestion using the ingest_ns + an incremental id as a message ID. This is:
1. fast since we just have to look at two integers, no RNG, no hashing 2. deterministic since the ingest_ns is settable, and the incremental counter is well, incremental so determinstic as well 3. it avoids duplicates by not creating duplicates on the same system due to incremtal id's and makes them extremely unlikely on multiple systems due to nano second timestamps involved.
Can I keep thread-id just to reduce the collision probability more? 😅
Okay, I have made the changes as suggested in the latest comment.
(epoch_timestamp & BITS_13) | (auto_increment_id & !BITS_13) | (thread_id_u64 & BITS_13)
(epoch_timestamp is ingest_ns when process()
is called, in finish()
there's no ingest_ns so I am using current timestamp.)
Let me know your thoughts!
I would remove the thread ID. It is breaking the distribution and making the date less unique also it prevents event id's to be replayable. Basiclly it will result in the lower 13 bit to ave 3 times as many 1's as 0's making them more likely to collide while also making the id not determinstic.
The second problem I spot is that by removing the lower 13 bit from increment, that part will have no effect for the first 8192 messages and then only change every 8192 messages. My suggestion would be to shift it by 13 bits instead of truncating them.
Lastly I'd probably pull in a few more bits form the timestap, 13 bit are only 0,0000082 secs
even if you bump it to 16 that means you get a window of 0,000066 secs
+ a 48bit counter (or 281.474.976.710.656 events to count)
You'd end up with something like this:
(epoch_timestamp & 0xFF_FF) | (auto_increment_id << 16 )
``
do you think that would solve the original problem?
I will fix the clippy checks and DCO, please allow me sometime.
Honestly I didn't anticipate this to happen but I also created otel gelf-exporter. :)
Line 190:
let current_epoch_timestamp = u64::try_from(SystemTime::now().duration_since(UNIX_EPOCH).expect("SystemTime before UNIX EPOCH!").as_nanos())?;
Not sure how to handle this better, is this okay?
We'd want to avoid the expect. Let me explain why.
expect
and unwrap
cause a crash, in some applications that's okay since, say you have a CLI if something goes wrong you might just want to error out and restart from scratch. Now tremor is a bit more complex there might be more then one pipeline running so we don't want one pipeline affect the progress of another. If we'd crash in pipeline A we'd take pipeline B down with it - that's not desirable. So we handle errors and let them bubble up so only the pipeline causing the issue is affected.
To do that there are a few ways ?
in many places work, but if the error type can't be mapped something like map_err(...)?
is a good alternative.
@Licenser @darach Does it look good now?
(I have fixed the DCO, format, clippy-check, code quality checks)
@BharatKJain LGTM now. I think you just need to click on resolve conversation
for the open review comment and we're all good!
Pull request
Description
Changed message-id from auto-increment ID to randomized ID in postprocessor/gelf_chunking.rs
HELP NEEDED: I have avoided adding hostname while producing message-id, I am not sure how can we handle adding hostname, please suggest.
Related
Checklist
Performance