os-climate / os_c_data_commons

Repository for Data Commons platform architecture overview, as well as developer and user documentation
Apache License 2.0
20 stars 10 forks source link

Establish naming conventions for pipelines and define security arch for Kafka cluster #237

Open HeatherAck opened 1 year ago

HeatherAck commented 1 year ago

Establish naming conventions for pipelines, topics, etc so we can continue to implement more pipelines quickly. Also, security architecture for the Kafka clusters. At the moment they are open to all namespaces in the cluster.

bryonbaker commented 1 year ago

Based on this article, I am suggesting that we move to a single "monolithic" Kafka cluster for all of OS Climate. At the moment there are two. If in the future we have some in-0country data requirements then we would add additional clusters. But I can't see that on the horizon.

https://developers.redhat.com/articles/2022/03/10/which-better-single-kafka-cluster-rule-them-all-or-many#summary

So FX and CO2 Signal need to be consolidated to start.

Next question is the topic naming convention. There are a number of popular approaches I have seen. This blog posts the best collectiong of conventions I have seen: https://cnr.sh/essays/how-paint-bike-shed-kafka-topic-naming-conventions

My recommendation is that the topics be named: .. message type := "push" because is is a data being pushed from a scheduled job. dataset name := "ecb" and "co2signal" because these are the data bases we are using database := fx and country because it is the FX table and the Country-Specific table being queried.

For the current topics these would be: push.ecb.fx push.co2signal.country

@caldeirav - any thoughts?