[common][samza-producer] Add a new option in stream job to leverage multiple producers

Summary

In a recent experiment, we noticed two issues:

KafkaProducer compression is quite time consuming for large records.
KafkaProducer is not scalable when using it in a multiple-thread env.

Streaming job typically runs in a container to process the upstreaming topic partition, and they need to process these events sequentially even these events belong to different Venice partitions, and the single-threaded KafkaProducer is limiting the total throughput and cpu resources are mostly under utilized.

To get round of this issue, in this PR, we add a capability to use multiple producers even in a single-threaded streaming application. And we don't want to disable compression as we would like to reduce pubsub usage and cross-colo replication bandwidth.

Two new options in VeniceWriter:

venice.writer.producer.thread.count : default 1
venice.writer.producer.queue.size: default 5MB

Stream job can configure it with a higher number of producer thread count to reduce the latency of Producer#send invocation, which is a blocking call, so that the total throughput will be improved.

The config: venice.writer.producer.queue.size is used to control the queue size between writer thread and the producer thread and if the queue is full, the writer thread will be blocked.

How was this PR tested?

Does this PR introduce any user-facing changes?

[x] No. You can skip the rest of this section.
[ ] Yes. Make sure to explain your proposed changes and call out the behavior change.

linkedin / venice