6. Processor API - Githubissues

This chapter covers
- Evaluating higher-level abstractions versus more control
- Working with sources, processors, and sinks to create a topology
- Digging deeper into the Processor API with a stock analysis processor
- Creating a co-grouping processor
- Integrating the Processor API and the Kafka Streams API

6.1. The trade-offs of higher-level abstractions vs. more control

example : ORM (간단한 쿼리는 ORM이 편하지만 복잡한건 그냥 SQL이 편하단소리함) trade off가 있는거임
kafka stream DSL로 힘든걸 알아 봅시다. -> Processor API로 해결합니다
- Schedule actions to occur at regular intervals
- Gain full control over when records are sent downstream
- Forward records to specific child nodes
- Create functionality that doesn’t exist in the Kafka Streams API

6.2. Working with sources, processors, and sinks to create a topology

맥주 판매를 예시로 듬
장사가 잘되서 국내 국외 다 판매하며, 추후에 dollar로 전부 바꿀것임

6.2.1. Adding a source node

topology.addSource(LATEST,
                  purchaseSourceNodeName,
                  new UsePreviousTimeOnInvalidTimestamp(),
                  stringDeserializer,
                  beerPurchaseDeserializer,
                  Topics.POPS_HOPS_PURCHASES.topicName())

topology.addSource 는 DSL이 아님. DSL은 builder.stream() 호출해서 생성함.
이것저것 이름도 만들어줘야되고 할게 많음ㅋ
타임스탬프도 해줘야되고 serde도 해주고 auto.offset.reset 도 정해줘야하는듯

6.2.2. Adding a processor node

source -> processor node

BeerPurchaseProcessor beerProcessor =
 new BeerPurchaseProcessor(domesticSalesSink, internationalSalesSink);

topology.addSource(LATEST,
                  purchaseSourceNodeName,
                  new UsePreviousTimeOnInvalidTimestamp(),
                  stringDeserializer,
                  beerPurchaseDeserializer,
                  Topics.POPS_HOPS_PURCHASES.topicName())
       .addProcessor(purchaseProcessor,
                     () -> beerProcessor,
                     purchaseSourceNodeName); // 부모 자식 노드관계 설립

process 목적

1. Convert international sales amounts (in euros) to US dollars.
2. Based on the origin of the sale (domestic or international), route the record to the appropriate sink node.

public class BeerPurchaseProcessor extends AbstractProcessor<String, BeerPurchase> {
    private String domesticSalesNode;
    private String internationalSalesNode;

    public BeerPurchaseProcessor(String domesticSalesNode,
                                 String internationalSalesNode) {
        this.domesticSalesNode = domesticSalesNode;
        this.internationalSalesNode = internationalSalesNode;
    }

    @Override
    public void process(String key, BeerPurchase beerPurchase) {
        Currency transactionCurrency = beerPurchase.getCurrency();

        if (transactionCurrency != DOLLARS) {
            BeerPurchase dollarBeerPurchase;
            BeerPurchase.Builder builder =
 BeerPurchase.newBuilder(beerPurchase);
            double internationalSaleAmount = beerPurchase.getTotalSale();
            String pattern = "###.##";
            DecimalFormat decimalFormat = new DecimalFormat(pattern);
            builder.currency(DOLLARS);
            builder.totalSale(Double.parseDouble(decimalFormat.format(transactionCurrency
 .convertToDollars(internationalSaleAmount))));
            dollarBeerPurchase = builder.build();
            context().forward(key,
 dollarBeerPurchase, internationalSalesNode);
        } else {
            context().forward(key, beerPurchase, domesticSalesNode);
        }
    }
}

6.2.3 Adding a sink node

addSink()

topology.addSource(LATEST,
                  purchaseSourceNodeName,
                  new UsePreviousTimeOnInvalidTimestamp(),
                  stringDeserializer,
                  beerPurchaseDeserializer,
                  Topics.POPS_HOPS_PURCHASES.topicName())
       .addProcessor(purchaseProcessor,
                     () -> beerProcessor,
                     purchaseSourceNodeName)

       .addSink(internationalSalesSink,
                "international-sales",
                stringSerializer,
                beerPurchaseSerializer,
                purchaseProcessor) //parent 가 같음

       .addSink(domesticSalesSink,
                "domestic-sales",
                stringSerializer,
                beerPurchaseSerializer,
                purchaseProcessor); //parent 가 같음

6.3. Digging deeper into the Processor API with a stock analysis processor

requirement

1. Show the current value of the stock.
2. Indicate whether the price per share is trending up or down. ( 거래 하나당 업 다운)
3. Include the total share volume so far, and whether the volume is trending up or down. (총 양)
4. Only send records downstream for stocks displaying 2% trending (up or down).  (2% 만 보냄)
5. Collect a minimum of 20 samples for a given stock before performing any calculations. (계산을 수행하기 전에 주어진 주식에 대해 최소 20 개의 샘플을 수집하십시오.)

decision tree

6.3.1. The stock-performance processor application

Topology topology = new Topology();
 String stocksStateStore = "stock-performance-store";  //state store
 double differentialThreshold = 0.02;

KeyValueBytesStoreSupplier storeSupplier =
 Stores.inMemoryKeyValueStore(stocksStateStore);
StoreBuilder<KeyValueStore<String, StockPerformance>> storeBuilder
 = Stores.keyValueStoreBuilder(
 storeSupplier, Serdes.String(), stockPerformanceSerde);

  topology.addSource("stocks-source",
                    stringDeserializer,
                    stockTransactionDeserializer,
                    "stock-transactions")
          .addProcessor("stocks-processor",
 () -> new StockPerformanceProcessor(
 stocksStateStore, differentialThreshold), "stocks-source")
          .addStateStore(storeBuilder,"stocks-processor")
          .addSink("stocks-sink",
                   "stock-performance",
                   stringSerializer,
                   stockPerformanceSerializer,
                   "stocks-processor");

you need to use a state store, and you also want to schedule when you emit records, instead of forwarding records each time you receive them.

init() method

@Override
public void init(ProcessorContext processorContext) {
  super.init(processorContext);
  keyValueStore =
 (KeyValueStore) context().getStateStore(stateStoreName);
  StockPerformancePunctuator punctuator =
 new StockPerformancePunctuator(differentialThreshold,
                              context(),
                              keyValueStore);
  context().schedule(10000, PunctuationType.WALL_CLOCK_TIME,
 punctuator); //10초다마, WALL_CLOCK_TIME을 쓰며, punctuator 는 timestamp 다루는거에 영향을 받음.
 }
}

WALL_CLOCK_TIME 기준으로 매 10초간 punctuate function call을 진행함.

Let’s take a moment to discuss the difference between these two PunctuationType settings.

Punctuation semantics

Punctuation scheduling using STREAM_TIME
The ProcessorContext.schedule(long, PunctuationType, Punctuator) method returns a type of Cancellable, allowing you to cancel a punctuation and manage more-advanced scenarios,

(https://livebook.manning.com/#!/book/kafka-streams-in-action/chapter-6/84)

글자는 record, 숫자는 타임스탬프라고 인지하고 두개의 파티션이 있음
punctuate는 5초간격
왜 파티션을 번가라가면서 호출하지?

The StreamTask extracts the smallest timestamp from the PartitionGroup. The PartitionGroup is a set of partitions for a given StreamThread, and it contains all timestamp information for all partitions in the group.
During the processing of records, the StreamThread iterates over its StreamTask object, and each task will end up calling punctuate for each of its processors that are eligible for punctuation. Recall that you collect a minimum of 20 trades before you examine an individual stock’s performance.
If the timestamp from the last execution of punctuate (plus the scheduled time) is less than or equal to the extracted timestamp from the PartitionGroup, then Kafka Streams calls that processor’s punctuate() method.

The key point here is that the application advances timestamps via the TimestampExtractor, so punctuate() calls are consistent only if data arrives at a constant rate. If your flow of data is sporadic, the punctuate() method won’t get executed at the regularly scheduled intervals.

끊임없이 오는 데이터랑 연관이 있고, Timestamp가 중요하고, 띄엄띄엄으로 오면 사용할수 없다는 소리인듯

6.3.2 The process() method

Check the state store to see if you have a corresponding StockPerformance object for the record’s stock ticker symbol.
If the store doesn’t contain the StockPerformance object, one is created. Then, the StockPerfomance instance adds the current share price and share volume and updates your calculations.
Start performing calculations once you hit 20 transactions for any given stock.

// process() implementation
@Override
public void process(String symbol, StockTransaction transaction) {
 StockPerformance stockPerformance = keyValueStore.get(symbol);

 if (stockPerformance == null) {
  stockPerformance = new StockPerformance();
 }

 stockPerformance.updatePriceStats(transaction.getSharePrice());
 stockPerformance.updateVolumeStats(transaction.getShares());
 stockPerformance.setLastUpdateSent(Instant.now());

 keyValueStore.put(symbol, stockPerformance);
}

process 한 결과는 state store에 저장하고 Punctuator.punctuate method를 통해 레코드를 남긴다.

6.3.3 The punctuator execution

// key value iterator 돌면서, 확인하고 downstream으로 forward할뿐..

@Override
public void punctuate(long timestamp) {
  KeyValueIterator<String, StockPerformance> performanceIterator =
 keyValueStore.all();

  while (performanceIterator.hasNext()) {
        KeyValue<String, StockPerformance> keyValue =
 performanceIterator.next();
        String key = keyValue.key;
        StockPerformance stockPerformance = keyValue.value;

        if (stockPerformance != null) {
            if (stockPerformance.priceDifferential()
 >= differentialThreshold ||
                stockPerformance.volumeDifferential()
 >= differentialThreshold) {
                context.forward(key, stockPerformance);
            }
        }
  }
}

6.4 The co-group processor

spark의 PairRDDFunctions.cogroup 와 유사하다

6.4.1. Building the co-grouping processor

Define two topics (stock-transactions, events).
Add two processors to consume records from the topics.
Add a third processor to act as an aggregator/co-grouping for the two preceding processors.
Add a state store for the aggregating processor to keep the state for both events.
Add a sink node to write the results to (and/or a printing processor to print results to console).

Defining the source nodes

topology.addSource("Txn-Source",
                  stringDeserializer,
                  stockTransactionDeserializer,
                  "stock-transactions")
       .addSource("Events-Source",
                  stringDeserializer,
                  clickEventDeserializer,
                  "events")

Adding the processor nodes

.addProcessor("Txn-Processor",
              StockTransactionProcessor::new,
              "Txn-Source")

.addProcessor("Events-Processor",
              ClickEventProcessor::new,
              "Events-Source")

.addProcessor("CoGrouping-Processor",
              CogroupingProcessor::new,
              "Txn-Processor",
              "Events-Processor")

public class CogroupingProcessor extends
 AbstractProcessor<String, Tuple<ClickEvent,StockTransaction>> {

    private KeyValueStore<String,
 Tuple<List<ClickEvent>,List<StockTransaction>>> tupleStore;
    public static final  String TUPLE_STORE_NAME = "tupleCoGroupStore";

    @Override
    @SuppressWarnings("unchecked")
    public void init(ProcessorContext context) {
        super.init(context);
        tupleStore = (KeyValueStore)
 context().getStateStore(TUPLE_STORE_NAME);
        CogroupingPunctuator punctuator =
 new CogroupingPunctuator(tupleStore, context());
        context().schedule(15000L, STREAM_TIME, punctuator); //15초마다.. STREAM_TIME이용하며 데이터 도착하면 punctautor를 실행함. 흐름이 띄엄띄엄이라면 15초가 넘을수 있다.. 위에서 말한듯
    }

@Override
    public void process(String key,
 Tuple<ClickEvent, StockTransaction> value) {
        Tuple<List<ClickEvent>, List<StockTransaction>> cogroupedTuple  = tupleStore.get(key);
        if (cogroupedTuple == null) {
             cogroupedTuple = Tuple.of(new ArrayList<>(), new ArrayList<>());
        }

        if (value._1 != null) {
            cogroupedTuple._1.add(value._1);
        }

        if (value._2 != null) {
            cogroupedTuple._2.add(value._2);
        }

        tupleStore.put(key, cogroupedTuple);
    }
}

// The CogroupingPunctuator.punctuate() method
// leaving out class declaration and constructor for clarity
  @Override
  public void punctuate(long timestamp) {
    KeyValueIterator<String, Tuple<List<ClickEvent>,
 List<StockTransaction>>> iterator = tupleStore.all();

    while (iterator.hasNext()) {
      KeyValue<String, Tuple<List<ClickEvent>, List<StockTransaction>>>  cogrouping = iterator.next();

      // if either list contains values forward results
      if (cogrouping.value != null && (!cogrouping.value._1.isEmpty() || !cogrouping.value._2.isEmpty())) {
          List<ClickEvent> clickEvents = new ArrayList<>(cogrouping.value._1);
          List<StockTransaction> stockTransactions = new ArrayList<>(cogrouping.value._2);

          context.forward(cogrouping.key, Tuple.of(clickEvents, stockTransactions));
          cogrouped.value._1.clear();
          cogrouped.value._2.clear();
          tupleStore.put(cogrouped.key, cogrouped.value);
      }
    }
    iterator.close();
}

Adding the state store && Adding the sink node

목적은 Processor API를 잘 사용하면 좋겠다.

6.5. Integrating the Processor API and the Kafka Streams API

ValueTransformer : KStream.process, KStream.transform, and KStream.transformValues
process, transform 차이점 : KStream.process method creates a terminal node, whereas the KStream.transform (or KStream.transformValues) method returns a new KStream instance allowing you to continue adding processors to that node.
Processor.process method and placing it in the Transformer.transform method, but returning null and forwarding results with ProcessorContext.forward is an option.

Kstream + Processor API : StockPerformanceStreamsAndProcessorApplication.java
Processor API : StockPerformanceApplication.java
ZMart Process API : ZMartProcessorApp.java.

Summary

The Processor API gives you more flexibility at the cost of more code.
Although the Processor API is more verbose than the Kafka Streams API, it’s still easy to use, and the Processor API is what the Kafka Streams API, itself, uses under the covers.
When faced with a decision about which API to use, consider using the Kafka Streams API and integrating lower-level methods (process(), transform(), transformValues()) when needed.

kgneng2 / blokg

6. Processor API #11

6.1. The trade-offs of higher-level abstractions vs. more control

6.2. Working with sources, processors, and sinks to create a topology

6.2.1. Adding a source node

6.2.2. Adding a processor node

6.2.3 Adding a sink node

6.3. Digging deeper into the Processor API with a stock analysis processor

decision tree

6.3.1. The stock-performance processor application

Punctuation semantics

6.3.2 The process() method

6.3.3 The punctuator execution

6.4 The co-group processor

6.4.1. Building the co-grouping processor

Defining the source nodes

Adding the processor nodes

Adding the state store && Adding the sink node

6.5. Integrating the Processor API and the Kafka Streams API

Summary

6.3.2