Open sj-williams opened 5 months ago
upgrade applied, image runbook updated
Rolling back logging fluentbit via helm to 2.2.1
following frequent crashing after 3.0.2
upgrade (2-3 times daily)
[2024/05/16 08:26:03] [engine] caught signal (SIGSEGV)
#0 0x5580c05d5c49 in flb_log_event_encoder_dynamic_field_flush_scopes() at src/flb_log_event_encoder_dynamic_field.c:210
#1 0x5580c05d5c49 in flb_log_event_encoder_dynamic_field_reset() at src/flb_log_event_encoder_dynamic_field.c:240
#2 0x5580c05d3bac in flb_log_event_encoder_reset() at src/flb_log_event_encoder.c:33
#3 0x5580c0602f1f in ml_stream_buffer_flush() at plugins/in_tail/tail_file.c:418
#4 0x5580c0602f1f in ml_flush_callback() at plugins/in_tail/tail_file.c:919
#5 0x5580c05b8757 in flb_ml_flush_stream_group() at src/multiline/flb_ml.c:1515
#6 0x5580c05b8eb5 in flb_ml_flush_parser_instance() at src/multiline/flb_ml.c:117
#7 0x5580c05d6c1c in flb_ml_stream_id_destroy_all() at src/multiline/flb_ml_stream.c:316
#8 0x5580c06036ac in flb_tail_file_remove() at plugins/in_tail/tail_file.c:1249
#9 0x5580c05ff405 in tail_fs_event() at plugins/in_tail/tail_fs_inotify.c:242
#10 0x5580c0588894 in flb_input_collector_fd() at src/flb_input.c:1949
#11 0x5580c05a2507 in flb_engine_handle_event() at src/flb_engine.c:575
#12 0x5580c05a2507 in flb_engine_start() at src/flb_engine.c:941
#13 0x5580c057e153 in flb_lib_worker() at src/flb_lib.c:674
#14 0x7fdcebc0fea6 in ???() at ???:0
#15 0x7fdceb4c3a6e in ???() at ???:0
#16 0xffffffffffffffff in ???() at ???:0
Theres an open issue on this one: https://github.com/fluent/fluent-bit/issues/8779
Keep an eye on updates here for any resolution
Background
There is a new major release of fluent-bit available, we are currently on version
2.2.1
https://fluentbit.io/announcements/v3.0.0/
https://fluentbit.io/announcements/v3.0.1/
Associated Helm release is
v0.46.1
: https://github.com/fluent/helm-charts/blob/fluent-bit-0.46.1/charts/fluent-bit/Chart.yaml#L8-L9Approach
Fluent don't publish specifics regarding Kubernetes version compatibility. Release / upgrades notes make no mention of breaking changes. However this is a major release with new functionality added, so thorough checking of changelogs and testing is important.
Which part of the user docs does this impact
Communicate changes
Questions / Assumptions
Definition of done
Reference
How to write good user stories