kaltura / nginx-kafka-log-module

Send Kafka messages from Nginx
BSD 2-Clause "Simplified" License
65 stars 22 forks source link

rdkafka properties as directive parameters #9

Open larry-cdn77 opened 3 years ago

larry-cdn77 commented 3 years ago

Revamp configuration directives with properties parsed by rdkafka

Motivation is ability to set properties at configuration time. Currently a known set is hardcoded in configuration parsing together with string/int types (note that rdkafka property setting is always string). It will be useful to tweak various properties without a code change.

Tests:

Not all configuration errors are detected in master process, eg dependencies between properties like idempotence and acks. Those are flagged by rdkafka at time of producer create, at which point you get dying workers.

kaltura-hooks commented 3 years ago

Hi @larry-cdn77, Thank you for contributing this pull request! Please sign the Kaltura CLA so we can review and merge your contribution. Learn more at http://bit.ly/KalturaContrib

erankor commented 3 years ago

Thanks @larry-cdn77, I'm ok with adding a generic directive for arbitrary librdkafka params, but prefer not to remove the existing ones, since -

  1. I think the defaults in the code are more suitable for this use case than the more generic defaults in librdkafka (e.g. enabling compression by default + limiting the number of retries)
  2. Removing the defaults + existing directives would force us to update the conf of all components that currently use this module

Regarding the enable flag, the module writes to kafka only when -

  1. The location uses the kafka_log directive
  2. The topic expression evaluates to some non-empty string So I don't think we need another directive to disable it.
larry-cdn77 commented 3 years ago

Thank you for the response

Clearly, I did not consider compatibility. Here is another changeset then, with the current directives doing the same as the new universal one.

One exception is log level, where rd_kafka_set_log_level seems to be phased out in favour of the global 'log_level' property. So I switched over to using 'log_level'.

I retained the direct setting of rdkafka properties only because having this fail in worker is messy in my environment. As much as it makes sense, I try to push configuration problems into master and spot them nice and early.

I do think the enable flag is useful but might be missing something. The case I address with it is module compiled but not used. Worker process init (ngx_http_kafka_log_init_worker) will call ngx_kafka_log_configure_kafka and that just goes on trying to create a producer. No bootstrap servers have been set and you get errors that are confusing if a developer happens to have no idea about Kafka module (despite it being compiled in).

larry-cdn77 commented 3 years ago

I noticed that the string translation from directive to property in ngx_http_kafka_log_set_property could probably be implemented with the post attribute of ngx_command_t

erankor commented 3 years ago

Right, better use the post attribute for this, instead of mapping it in the code. Also, for setting the defaults in ngx_kafka_log_init_kafka, I think it would be more elegant to have a static array of ngx_keyval_t that is null terminated (ngx_null_string) and loop over it. Regarding the enable flag, I want to understand the issue better... you said that if the module is built into nginx and there are no kafka_log directives in the conf, it fails to start?

larry-cdn77 commented 3 years ago

Right, better use the post attribute for this, instead of mapping it in the code.

Now changed. Looks a great deal better.

Also, for setting the defaults in ngx_kafka_log_init_kafka, I think it would be more elegant to have a static array of ngx_keyval_t that is null terminated (ngx_null_string) and loop over it.

Done, thanks, also looks and reads much better. Used zero-terminated string so that I am not going into NGINX string and then back down again straight after.

Regarding the enable flag, I want to understand the issue better... you said that if the module is built into nginx and there are no kafka_log directives in the conf, it fails to start?

Sorry, I should have begun with an example scenario. Build with head of nginx-kafka-log-module, use nginx.conf without any Kafka directives and start NGINX. The module will attempt to create a producer without a bootstrap address with this error log message from rdkafka:

No `bootstrap.servers` configured: client will not be able to connect to Kafka cluster

It could be confusing to see this unexpectedly, and potentially time consuming to diagnose if you don't know that the module is compiled in and what it is for. It also relies on the producer creation failing gracefully in rdkafka, which may not always be the case. An enable flag is my attempt to be handle this defensively, but appreciate that it is possibly against how modules normally interface.