Getting Started with Kafka Streams

Kappa Architecture : 오직 streaming 만 처리
- 간단한게 장점
Lambda architecture: streaming 과 batch 를 모두 지원
- 둘 다 지원을 하다보니 관리 유지가 복잡함 (spark, flink)
  Processor Topologies
Dataflow programming (DFP)
- 데이터 중심으로 연속적으로 연결된 입력/출력/프로세싱 프로그래밍
directed acyclic graph (DAG)
3 가지의 processors 가 있음
- source : read data from kafka, file.. etc
- stream: map, filter, flatMap, join 등 operator 를 사용하여 가공
- sink: written back to kafka, file, etc
Sub-Topologies
- 작은 여러 topology 로 분리
- 병렬처리
- 각 스트림은 독릭적으로 처리
Depth-First Processing
- 하나의 recode 는 오직 하나의 topology 에 존재함

Tasks and Stream Threads

topology 정의는 실제 프로그램이 실행되는 것이 아니라 데이터가 어떻게 흘러가는가에 대한 정의를 만드는 과정
Task
- 병렬처리가 가능한 가장 작은 단위
- The maximum parallelism = the maximum number of stream tasks = the maximum number of partitions of the input topic(s)
- 16개의 파티션이 있으면 16개의 task 들이 생성되고 각 task 에는 고유의 topology 가 생성 됨
- 인스턴스를 생성하고 topology 를 실행하는 논리적 기본 단위
Thread
- 실제로 task 를 실행
- num.stream.threads 로 어플리케이션에서 사용할 thread 수를 조절
  High-Level DSL Versus Low-Level Processor API

The high-level DSL

쉽고 간단함

복잡한 로직이 추상화 되어있음

StreamsBuilder builder = new StreamsBuilder();
KStream<byte[], byte[]> stream = builder.stream("tweets");

The low-level Processor API
- 좀 더 세부적이고 디테일한 처리 가능
- DSL 보다 사용하기는 어려움
  Streams(a record stream) and Tables (a changelog stream)
  
  2 개의 데이터 구조에 따라 사용하는 topic 설정/구조가 달라짐
streams
- 모든 로그를 기록
tables
- 최종 로그만 기록
- topic 의 cleanup.policy 는 기본적으로 compact
- stateful
- aggregation 작업 가능
  KStream, KTable, GlobalKTable
  
  high-level DSL 에서 사용가능
KStream
- 추상화 된 partitioned 된 레코드 스트림 (insert semantic)
KTable
- 추상화 된 partitioned 된 레코드 테이블 (update semantic)
GlobalKTable
- KTable 과 비슷하지만 unpartitioned 되어 있어 모든 데이터를 가짐

Stateless Processing

stateless processing
- 소비하고 처리한 후 잊혀지는 구조
- Filtering records
- Adding and removing fields
- Rekeying records
- Branching streams
- Merging streams
- Transforming records into one or more outputs
- Enriching records, one at a time

Stateless Versus Stateful Processing

stateless
- 다른 이벤트와 독립적으로 처리
- 이전 데이터가 아닌 현재 시점의 데이터만 처리 (self-contained insert)
stateful
- 이전 데이터를 기록
- 보통 aggregation, window, join 시 에 사용
- maintenance, scalability, and fault tolerance

Adding the Source Processors

Serdes
- serializer 와 deserializer 를 하나의 클래스에 구현하여 제공
- 필요한 타입이 없는 경우 custom 으로 구현가능함
```
public class TweetSerdes implements Serde<Tweet> {
```
@Override public Serializer serializer() { return new TweetSerializer(); }

@Override public Deserializer deserializer() { return new TweetDeserializer(); } }
Filtering Data
- filter
- filterNot
- Branching Data
- 하나의 stream 을 N개의 stream으로 분리
```
Predicate<byte[], Tweet> englishTweets =
(key, tweet) -> tweet.getLang().equals("en"); 1
```

Predicate<byte[], Tweet> nonEnglishTweets = (key, tweet) -> !tweet.getLang().equals("en"); 2

KStream<byte[], Tweet>[] branches = filtered.branch(englishTweets, nonEnglishTweets); 1

KStream<byte[], Tweet> englishStream = branches[0]; 2

KStream<byte[], Tweet> nonEnglishStream = branches[1]; 3

- Translating Data
  - map (1:1)
  - mapValues (rekey - 키를 변경 하는 경우가 아니면 mapValues 가 효휼적)
  - flatMap (0..1..N)
  - flatMapValues
- Merging Streams
  - Branching 과 반대로 N개의 stream 을 하나로 합침
  - SQL의 union query 와 동일하다고 보면 됨
```java
KStream<byte[], Tweet> merged = englishStream.merge(translatedStream);

Adding a Sink Processor

to
through
repartition

Stateful Processing

이전의 데이터를 query 할 수 있으며 aggregation 등의 작업을 할 수 있음

Stateful Operators

Joining data
- join (inner join)
- leftJoin
- outerJoin
Aggregating data
- aggregate
- count
- reduce
Windowing data
- windowedBy
  State store
  
  RocksDB 사용
Embedded
- task level 별로 state store 구성
- 적은 latency 와 processor 보장
- 동시성 문제 해결
Multiple access modes
- read / write 동시 지원
- 클라이언트 read only 지원
Fault tolerant
- change log 를 kafka topic 에 저장
- Application 시작 시 chang log topic 으로부터 재구축
- Standby replicas 지원 (shadow copies)
Persistent Stores
- 메모리에 일정 크기 이상 채워지면 disk 에 기록 (spilling to disk)
- 오류로 재시작 시 전체를 replay 하는게 아니고 서비스가 내려간 시간동안에 변경된 데이터만 topic replay
- StreamsConfig.STATE_DIR_CONFIG 의 기본값은 /tmp/kafka-streams 이지만 /tmp 를 사용하지 않는 것이 좋음
- 단점으로는
- RocksDB 튜닝
- secondary disk 구축 필요

Adding the Source Processors

KTable
- State 사용
- Keyspace 가 매우 클 경우 적합 (unique key 데이터가 많은 케이스)
- 데이터를 분산하여 저장
- local storage overhead 가 적어짐
GlobalKTable
- Keyspace 가 작을 경우
- 자주 갱신이 없는 static 성 데이터

Joins

join
- 양쪽 모두 이벤트 발생 시 동일한 키로 merge
leftJoin
- stream-table joins
- left stream 이벤트 발생 시 join 하며 right 가 없으면 right value 는 null
- stream-stream and table-table joins
- 양쪽 모두 이벤트 발생 시 join
- right 이벤트 발생 시 left 가 없으면 join 하지 않고 result 없음 처리
co-partitioning
- join 시에 동일한 key 가 존재해야 하므로 rekey 를 통하여 동일한 partition 에 데이터가 위치하도록 함
- selectKey()
- rekeyed data 를 내부 topic 으로 전송
- rekeyed data 를 다른 partition에서 처리
- repartition 따른 비용 발생
  Grouping Records
  
  KTable 은 오직 groupBy 만 지원
groupBy
- key 가 변경되기 때문에 repartition 발생 (내부 토픽 자동생성)
groupByKey
- key 를 동일하게 유지하기 때문에 repartition 에 따른 network 비용이 없음
  Aggregations
aggregate
reduce
count
Interactive Queries

Materialized Stores

aggregate 같은 경우 내부 state store 를 사용하며 processor topology 레벨에서만 access 가 가능함

Materialized store 는 processor topology 외부에 read-only query 를 지원해 줌 (remove 는 아님)

KTable<String, HighScores> highScores =
grouped.aggregate(
highScoresInitializer,
highScoresAdder,
Materialized.<String, HighScores, KeyValueStore<Bytes, byte[]>> 1
    as("leader-boards") 2
    .withKeySerde(Serdes.String()) 3
    .withValueSerde(JsonSerdes.HighScores()));

Accessing Read-Only State Stores
- The name of the state store
- The type of state store
- QueryableStoreTypes.keyValueStore()
- QueryableStoreTypes.timestampedKeyValueStore()
- QueryableStoreTypes.windowStore()
- QueryableStoreTypes.timestampedWindowStore()
- QueryableStoreTypes.sessionStore()
```
ReadOnlyKeyValueStore<String, HighScores> stateStore =
streams.store(
StoreQueryParameters.fromNameAndType(
    "leader-boards",
    QueryableStoreTypes.keyValueStore()));
```

// Point lookup HighScores highScores = stateStore.get(key);

// All entries KeyValueIterator<String, HighScores> range = stateStore.all();

// Range scans KeyValueIterator<String, HighScores> range = stateStore.range(1, 7); 1

while (range.hasNext()) { KeyValue<String, HighScores> next = range.next(); 2

String key = next.key;
HighScores highScores = next.value; 3

// do something with high scores object

}

range.close(); 4



- Local Queries
  - `GlobalKTable` 이 아닌 `KTable` 인 경우는 partition 된 일부의 데이터만 조회 가능
  - 하지만 remote query 를 할수 있는 방법을 제공
- Remote Queries
  - Full state 검색을 위해서는 아래의 조건을 만족해야 함 
    - 어떤 인스턴스가 어떤 데이터를 가지고 있는 discovery 가 필요 `queryMetadataForKey()`
    - 외부 통신을 위한 RPC 또는 REST service 필요
    - 클라이언트에도 서버와 통신을 위한  RPC 또는 REST service 필요
  - kafak streams 는 이런 built-in lib 를 지원해주지 않아서 직접 lib 를 제공해줘야 함
<img width="500" alt="스크린샷 2022-05-19 오전 11 34 25" src="https://user-images.githubusercontent.com/4098287/169192069-b97f6efa-1631-4bcf-8627-e514643d181f.png">

Windows and Time

Time Semantics

Event time
- 최초 데이터가 발생한 시간
Ingestion time
- topic 에 데이터가 추가된 시간
Processing time
- 실제 데이터가 처리된 시간
- 버그 fix 같은 데이터 재 처리시에는 시간이 변경될 수 있음
log.message.timestamp.type (broker level) CreateTime(event), LogAppendTime(ingestion)
message.timestamp.type (topic level)
- topic level 이 우선순위가 높음

Timestamp Extractors

window mark 를 위한 timestamp 추출 DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG 설정을 통해 변경

FailOnInvalidTimestamp
- timestamp 가 정상적이지 않을 경우 오류 발생
LogAndSkipOnInvalidTimestamp
- timestamp 가 정상적이지 않을 경우 warning 출력 후 다음 메시지 처리
WallclockTimestampExtractor
- system 의 local 시간을 timestamp 로 사용
  Windowing Streams
Tumbling windows
- 가장 기본적인 window 방식으로 일정한 간격으로 windowing 처리
- 중복 데이터 없읍
```
TimeWindows tumblingWindow = TimeWindows.of(Duration.ofSeconds(5)); 
```
Hopping windows
- tumbling 처럼 fixed size 로 나누지만 windowing interval 이 달라서 일부 overlap 이 발생
```
TimeWindows hoppingWindow =  TimeWindows
.of(Duration.ofSeconds(5))  
.advanceBy(Duration.ofSeconds(4));  
```
Session windows
- 일정한 gap 안에 event 가 있으면 window 가 유지되며 gap 을 벗어나면 새로운 window 생성
- 아래는 5초동안 메시지가 없으면 새로운 window 생성하는 예제
```
SessionWindows sessionWindow = SessionWindows.with(Duration.ofSeconds(5));
```
Sliding join windows
- fixed size window 이지만 join 을 통한 window 생성
- 2개의 key 가 fixed size 안에 있으면 join 그렇지 않으면 not join
Delayed message
- flink 는 watermark 를 만들어 delayed message 를 처리
- watermark 는 주어진 window에 대한 모든 데이터가 언제 도착해야 하는지 추정하는 데 사용
- 즉 어느정도까지 늦은 데이터를 허용할 지 결정할 수 있음
- kafka streams 에서는 flink 와 동일하게 grace period 를 설정함
- grace period 를 길게 설정하는 것은 비용이 큼
- 아래는 60초 window 내에서 5초 delayed message 를 허용
```
TimeWindows tumblingWindow =
TimeWindows
.of(Duration.ofSeconds(60))
.grace(Duration.ofSeconds(5));
```
Suppression
- suppress 는 window 처리에서 최종 결과만 emit
- 아래와 같이 결정이 필요한 사항이 있음
- 어떤 suppression 전략을 사용할 것인가
- 얼마나 많은 메모리 버퍼를 사용할 것인가
- 메모리가 다 차면 어떻게 할 것인가
- staratigies
- Suppressed.untilWindowCloses
- Suppressed.untilTimeLimit
Buffer Config
- BufferConfig.maxBytes(): 버퍼 메모리를 byte 로 설정
- BufferConfig.maxRecords(): 버퍼를 key 갯수 기준으로 설정
- BufferConfig.unbounded(): 버퍼를 heap 메모리를 제한없이 사용 (OOM 발생 할 수 있음)
Buffer Full Strategies
- shutDownWhenFull: 메모리 다 차면 application shutdown
- emitEarlyWhenFull: 메모리 다 차면 중간 계산된 결과값 emit

Advanced State Management

Persistent Store Disk Layout

Kafka streams 는 in-memory 와 persistent state stores 를 둘 다 지원
StreamsConfig.STATE_DIR_CONFIG 설정을 통해 데이터 저장 경로 지정

default 경로는 /tmp/kafka-streams

.
└── dev-consumer     -----> application ID
├── 0_0                     -----> task ID [<sub-topology-id>_<partition>]
│   ├── .lock
│   └── pulse-counts
├── 0_1
│   ├── .lock
│   └── pulse-counts
├── 0_2
│   ├── .lock
│   └── pulse-counts
├── 0_3
│   ├── .checkpoint      -----> change log topic 의 offset 정보
│   ├── .lock
│   └── pulse-counts   -----> 실제 데이터 저장, materializing state 이름으로 생성
│       └── ...
├── 1_0
│   ├── ...

Fault Tolerance

Changelog Topics
- kafka streams 에 의해 자동으로 생성되는 토픽
- 동일 key 에 대해 update 하며 application 시작 시 state 복구 시 사용 됨
- DSL 에서 Materialized class 로 설정 가능함
```
// set ephemeral store
// 이 경우 disk fail 시 복구불가
Materialized.as("pulse-counts").withLoggingDisabled();
```
- changelog topic 에 대한 설정 가능
```
Map<String, String> topicConfigs =
Collections.singletonMap("min.insync.replicas", "2"); 
```

KTable<Windowed, Long> pulseCounts = pulseEvents .groupByKey() .windowedBy(tumblingWindow) .count( Materialized.<String, Long, WindowStore<Bytes, byte[]>> as("pulse-counts") .withValueSerde(Serdes.Long()) .withLoggingEnabled(topicConfigs));

  - 이미 생성된 changelog topic 설정 수정은 kafka cli 로 직접 토픽을 수정해야 함

## Standby Replicas
- state store 의 downtime 시간을 줄이는 방법 중 하나로 여분의 instance 를 미리 올려놓는 방법이 있음
- `NUM_STANDBY_REPLICAS_CONFIG` 로 설정 가능
## Rebalancing: Enemy of the State (Store)
- 기본적으로 changelog topic 에 데이터를 저장하다보니 데이터가 많을 경우 application 의 state 재설정에 많은 시간이 걸릴 수 있음
- consumer rebalancing 경우 consumer group 모든 application 이 재설정 됨
- rebalancing 에 관련 broeker/consumer 명칭
  - `group coordinator`: consumer group 을 관리하는 broker
  -  `group leader`: consumer group 에서 partition 할당을 결정하는 consumer
## Preventing State Migration
- rebalancing 이 발생했을 때 대량의 state 를 복구 비용을 최소화하는 방법들
- Sticky Assignment
  - `StickyTaskAssignor`
  - rebalancing 발생 시 task 할당을 이전에 state 를 소유한 동일한 instance 에 할당하는 정책
  - 동일한 application 에 reassign 되므로 state 를 re-initialize 할 필요가 없음
    <img width="272" alt="스크린샷 2022-05-20 오전 11 58 00" src="https://user-images.githubusercontent.com/4098287/169441472-bc045053-9702-44a0-a661-855488e68cd5.png">
- Static Membership
  - application 이 다양한 상황에 inactive/active 상태로 되는 경우 coordinator 는 새로운 member 로 인식하고 새로 reassign 함
  - 이런 rebalance 를 최소화하기 위해 member id 를 고정하는 방법이 있음
  - `group.instance.id = app-1` 하드 코딩을 해야하는 단점
  - 좀 긴 session timeout 설정과 같이 사용 함
  - kafka 2.3 버전 이상에서만 동작
-  Eager rebalancing
   - 모든 application 이 rebalance 에 참여
   - 전체 프로세스 멈춤
   - replayed/rebuilt 비용 발생
- Incremental Cooperative Rebalancing
  - kafka 2.4 이상에서 도입된 새로운 rebalancing protocol
  - rebalancing 에 참여한 consumer 의 task 를 다른 instance에 reassign
  - 그 외 instance 는 reassign 에 참여하지 않음
    <img width="406" alt="스크린샷 2022-05-20 오후 3 19 08" src="https://user-images.githubusercontent.com/4098287/169465201-e01b1352-dd14-4836-aee0-89a506c2c162.png">
- Tombstones
  - null 로 state 를 저장하면 kafka streams 가 물리삭제를 진행 
  - disk 공간 절약
- Aggressive topic compaction
  - topic 은 partition 으로 나누어 지고 각 partition 은 segment (broker side) 라는 작은 파일로 관리
  - tombstones 또는 log compaction 같은 작업은 이 segment 사이즈가 작을 수록 효과적임
    - `segment.bytes`
    -  `segment.ms`
    - `min.cleanable.dirty.ratio`
    - `max.compaction.lag.ms`
- Fixed-size LRU cache
  - LRU size 만큼 state store 를 사용
  - LRU 가 10개 만이라고 해도 reinitialize 시 전체 토픽의 데이터를 replay 가 필요게 단점
```java
KeyValueBytesStoreSupplier storeSupplier = Stores.lruMap("counts", 10);

han1448 / random

Mastering Kafka Streams and ksqlDB #1

Getting Started with Kafka Streams

Processor Topologies

Tasks and Stream Threads

High-Level DSL Versus Low-Level Processor API

Streams(a record stream) and Tables (a changelog stream)

KStream, KTable, GlobalKTable

Stateless Processing

Stateless Versus Stateful Processing

Adding the Source Processors

Adding a Sink Processor

Stateful Processing

Stateful Operators

State store

Adding the Source Processors

Joins

Grouping Records

Aggregations

Interactive Queries

Windows and Time

Time Semantics

Timestamp Extractors

Windowing Streams

Advanced State Management

Persistent Store Disk Layout

Fault Tolerance