tchiotludo / akhq

Kafka GUI for Apache Kafka to manage topics, topics data, consumers group, schema registry, connect and more...
https://akhq.io/
Apache License 2.0
3.37k stars 653 forks source link

Topic data access slow on sort NEWEST #192

Closed jorgheymans closed 1 year ago

jorgheymans commented 4 years ago

Hi,

When in the topic list overview, clicking on a topic that has a lot of data takes a long time before the record list shows up. This is a topic with only about 500k records. Checking perf in another browsing tool (kafkatool.com), there the records display in 4-5 seconds for the same topic. Is there anything that could potentially slow down fetching the records ?

tchiotludo commented 4 years ago

Hello,

Can you provide me some log after playing this curl

curl -i -X POST -H "Content-Type: application/json" \
       -d '{ "configuredLevel": "TRACE" }' \
       http://localhost:8080/loggers/org.kafkahq

It will help me to understand the response time.

One question, what is your sort preference on this page, and is this topic is compacted ?

jorgheymans commented 4 years ago

here is the log, sort options are default so i guess 'Newest'

2020-01-17 12:51:04,195 DEBUG -thread-25 o.k.r.RecordRepository     16282 ms -> Consume with options [RecordRepository.Options(topic=internal_test, size=50, after={}, sort=NEWEST, partition=null, timestamp=null)]
2020-01-17 12:51:04,230 DEBUG -thread-25 org.kafkahq.utils.Logger   14 ms -> Describe Topic Config [internal_test]
2020-01-17 12:51:04,358 INFO  Group-1-10 org.kafkahq.log.access     [Date: 2020-01-17T12:50:47.643407+01:00] [Duration: 16712 ms] [Url: GET /kafka-nonprod/topic/internal_test HTTP/1.1] [Status: 200] [Ip: 0:0:0:0:0:0:0:1] [Length: 5569] [Port: 8050]
jorgheymans commented 4 years ago

The topic is not compacted no.

tchiotludo commented 4 years ago

Ok thanks for information !

It seems that you use the NEWEST sort by default. Default sort is OLDEST, it's because sort by NEWEST is slower than OLDEST. Kafka Api only allow to sort by OLDEST. The NEWEST is a hack I've to do on KafkaHQ in order to support this feature, to do this I use a parallelStream() that will fetch in parallel each partitions (with max of 1 thread per CPU, 8 cpu = 8 parallel thread).

So to resume, the NEWEST will depend

On the OLDEST, I use the standard Kafka Api, 1 Consumer fetch all the partition at the same time with 1 thread, and it's more efficient but I can't be done onNEWEST` sort because the record are on all partitions so we need to fetch all partitions.

If the topic is compacted, It can be worst (still with newest). Because it can lead to multiple query on Kafka to be able to fetch some data.

Here is the situation, but as I known I haven't any solution to speed up the NEWEST part (except have a config to allow more parallel fetch).

Just for information can you send me the log with sort OLDEST ?

jorgheymans commented 4 years ago

This is the debug with OLDEST, indeed it's really fast. Not sure that this is the default though, don't remember changing this ever in the UI. Or maybe i did very long time ago and it's stored in a cookie.

2020-01-20 10:02:54,011 DEBUG 1-thread-4 org.kafkahq.utils.Logger   11 ms -> Describe Topics [internal_test]
2020-01-20 10:02:54,183 DEBUG 1-thread-4 org.kafkahq.utils.Logger   167 ms -> Describe Topics Offsets [internal_test]
2020-01-20 10:02:54,191 DEBUG 1-thread-4 org.kafkahq.utils.Logger   6 ms -> Cluster
2020-01-20 10:02:54,202 DEBUG 1-thread-4 org.kafkahq.utils.Logger   17 ms -> List Log dir
2020-01-20 10:02:54,212 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 11] [start: 41689]
2020-01-20 10:02:54,221 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 12] [start: 41398]
2020-01-20 10:02:54,223 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 13] [start: 42547]
2020-01-20 10:02:54,225 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 14] [start: 43844]
2020-01-20 10:02:54,226 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 7] [start: 43009]
2020-01-20 10:02:54,227 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 8] [start: 43444]
2020-01-20 10:02:54,230 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 9] [start: 42060]
2020-01-20 10:02:54,232 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 10] [start: 43374]
2020-01-20 10:02:54,233 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 3] [start: 43086]
2020-01-20 10:02:54,234 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 4] [start: 42912]
2020-01-20 10:02:54,236 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 5] [start: 43441]
2020-01-20 10:02:54,237 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 6] [start: 44143]
2020-01-20 10:02:54,239 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 0] [start: 43800]
2020-01-20 10:02:54,239 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 1] [start: 43118]
2020-01-20 10:02:54,240 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 2] [start: 43382]
2020-01-20 10:02:54,241 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 19] [start: 42981]
2020-01-20 10:02:54,242 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 15] [start: 40706]
2020-01-20 10:02:54,244 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 16] [start: 42001]
2020-01-20 10:02:54,245 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 17] [start: 42685]
2020-01-20 10:02:54,246 TRACE 1-thread-4 o.k.r.RecordRepository     Consume [topic: internal_test] [partition: 18] [start: 41601]
2020-01-20 10:02:54,786 DEBUG 1-thread-4 o.k.r.RecordRepository     582 ms -> Consume with options [RecordRepository.Options(topic=internal_test, size=50, after={}, sort=OLDEST, partition=null, timestamp=null)]
2020-01-20 10:02:54,797 DEBUG 1-thread-4 org.kafkahq.utils.Logger   9 ms -> Describe Topic Config [internal_test]
2020-01-20 10:02:54,820 INFO  pGroup-1-8 org.kafkahq.log.access     [Date: 2020-01-20T10:02:53.9913347+01:00] [Duration: 828 ms] [Url: GET /kafka-nonprod/topic/internal_test?sort=OLDEST HTTP/1.1] [Status: 200] [Ip: 0:0:0:0:0:0:0:1] [Length: 9327] [Port: 8050]
tchiotludo commented 4 years ago

Ok just like I think.

The configuration can be changed on the configuration files only. This can't be changed permanently from the ui (only for current page), and this is not store in session or cookie

jorgheymans commented 4 years ago

Thanks, indeed it was set to NEWEST. Browsing OLDEST rarely makes sense in our case.

tchiotludo commented 4 years ago

So Like i said and can't change this behavior, since it was a limitation from Kafka Api.

The only things is to raise parallelism. How many CPU does your KafkaHQ installation have ?

jorgheymans commented 4 years ago

This is the CPU spec: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, 3601 Mhz, 4 Core(s), 8 Logical Processor(s)

For comparison, i have created a gif of loading the same topic data in the other tool. This just to exclude connectivity or infrastructure issues with the cluster. It does seem possible to get back some data rather quickly, so wondering where the main difference could be with kafkahq. I hope you don't mind me posting this, if not let me know and i'll remove it.

test-load

tchiotludo commented 4 years ago

No problems for screenshot. To be honest, I've no clue how kafka tools deal about this. KafkaHQ will use the param kafkahq.topicdata.size (default: 50) to determine the poll per partition. So must be 3 in your case with a parralel of 8.

I'have added some log on the dev docker image, is it possible to have the log output (will log line like Consume [topic: internal_test] [partition: 9]) ?

jorgheymans commented 4 years ago

@tchiotludo sorry but docker I cannot easily test here. Do you have a snapshot of the jar available somewhere ?

tchiotludo commented 4 years ago

Yes look at this : https://github.com/tchiotludo/kafkahq#early-dev-image

You can grab the jar on your local desktop

jorgheymans commented 4 years ago

Using latest dev build i get NPE when wanting to browse topic data. Note I built it myself locally.

Caused by: java.lang.NullPointerException: null
        at org.kafkahq.modules.KafkaModule.getRegistryClient(KafkaModule.java:181)
        at org.kafkahq.repositories.SchemaRegistryRepository.getKafkaAvroDeserializer(SchemaRegistryRepository.java:188)
        at org.kafkahq.repositories.RecordRepository.newRecord(RecordRepository.java:335)
        at org.kafkahq.repositories.RecordRepository.consumeOldest(RecordRepository.java:83)
        at org.kafkahq.repositories.RecordRepository.lambda$consume$0(RecordRepository.java:53)
        at org.kafkahq.utils.Debug.call(Debug.java:68)
        ... 189 common frames omitted
tchiotludo commented 4 years ago

Arg, I just push a new commit on dev branch, can you retry please ?

jorgheymans commented 4 years ago

Here is the new output. Note that at the end, the page says 'no data available' which i find a bit strange. I have seen this happening before with NEWEST, never with OLDEST. There is definitely data in the topic as you can see from below output. Sometimes after a few refreshes the data reappears.

2020-01-21 15:15:50,485 DEBUG -thread-14 org.kafkahq.utils.Logger   8 ms -> Describe Topics [internal_test]
2020-01-21 15:15:50,667 DEBUG -thread-14 org.kafkahq.utils.Logger   180 ms -> Describe Topics Offsets [internal_test]
2020-01-21 15:15:50,675 DEBUG -thread-14 org.kafkahq.utils.Logger   6 ms -> Cluster
2020-01-21 15:15:50,689 DEBUG -thread-14 org.kafkahq.utils.Logger   20 ms -> List Log dir
2020-01-21 15:15:50,718 TRACE l-worker-3 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 8] [start: 71690]
2020-01-21 15:15:50,723 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 2] [start: 70525]
2020-01-21 15:15:50,753 TRACE l-worker-1 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 17] [start: 69990]
2020-01-21 15:15:50,760 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 1] [start: 69334]
2020-01-21 15:15:50,756 TRACE -worker-15 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 6] [start: 71169]
2020-01-21 15:15:50,760 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 5] [start: 70765]
2020-01-21 15:15:50,760 TRACE -thread-14 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 12] [start: 67653]
2020-01-21 15:15:50,760 TRACE -worker-11 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 16] [start: 69550]
2020-01-21 15:15:55,750 TRACE l-worker-3 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 9] [start: 69593]
2020-01-21 15:15:55,773 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 4] [start: 69306]
2020-01-21 15:15:55,780 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 0] [start: 71112]
2020-01-21 15:15:55,793 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 10] [start: 70450]
2020-01-21 15:15:55,826 TRACE -thread-14 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 14] [start: 72041]
2020-01-21 15:15:55,827 TRACE -worker-15 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 11] [start: 69451]
2020-01-21 15:15:55,834 TRACE l-worker-1 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 19] [start: 70523]
2020-01-21 15:15:55,844 TRACE -worker-11 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 15] [start: 67242]
2020-01-21 15:16:00,774 TRACE l-worker-3 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 7] [start: 71021]
2020-01-21 15:16:00,808 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 3] [start: 69968]
2020-01-21 15:16:00,808 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 13] [start: 69930]
2020-01-21 15:16:00,808 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 18] [start: 68719]
2020-01-21 15:16:05,813 DEBUG -thread-14 o.k.r.RecordRepository     15119 ms -> Consume with options [RecordRepository.Options(topic=internal_test, size=3, after={}, sort=NEWEST, partition=null, timestamp=null)]
2020-01-21 15:16:05,827 DEBUG -thread-14 org.kafkahq.utils.Logger   10 ms -> Describe Topic Config [internal_test]
2020-01-21 15:16:05,837 INFO  pGroup-1-6 org.kafkahq.log.access     [Date: 2020-01-21T15:15:50.465087+01:00] [Duration: 15372 ms] [Url: GET /kafka-nonprod/topic/internal_test?sort=NEWEST HTTP/1.1] [Status: 200] [Ip: 0:0:0:0:0:0:0:1] [Length: 2462] [Port: 8050]
tchiotludo commented 4 years ago

The no data available can happen, because Kafka API can't let me know if there is data and I reach the poll timeout (that you can change in config). In this case, the Kafka API don't let me know if it's timeout or no record.

The latest log output is showing no data available, right ? Because every query are 5 seconds long. You seems to reach the timeout. We can see that you are using 8 parallel thread. Can you also provide some metrics ? (Log size and offset for this topic)

jorgheymans commented 4 years ago

I increased poll-timeout to 10000 but still getting "no data available". There are 550k records in the topic, size is about 110MB. EDIT: increasing poll-timeout to 20000 makes results appear again.

2020-01-22 08:09:59,394 DEBUG 1-thread-4 org.kafkahq.utils.Logger   8 ms -> Describe Topics [internal_test]
2020-01-22 08:09:59,613 DEBUG 1-thread-4 org.kafkahq.utils.Logger   217 ms -> Describe Topics Offsets [internal_test]
2020-01-22 08:09:59,623 DEBUG 1-thread-4 org.kafkahq.utils.Logger   8 ms -> Cluster
2020-01-22 08:09:59,634 DEBUG 1-thread-4 org.kafkahq.utils.Logger   19 ms -> List Log dir
2020-01-22 08:09:59,671 TRACE 1-thread-4 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 12] [start: 67905]
2020-01-22 08:09:59,711 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 17] [start: 70274]
2020-01-22 08:09:59,722 TRACE -worker-15 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 1] [start: 69539]
2020-01-22 08:09:59,727 TRACE l-worker-7 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 11] [start: 69633]
2020-01-22 08:09:59,730 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 2] [start: 70765]
2020-01-22 08:09:59,731 TRACE l-worker-3 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 6] [start: 71418]
2020-01-22 08:09:59,732 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 19] [start: 70666]
2020-01-22 08:09:59,734 TRACE -worker-11 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 16] [start: 69737]
2020-01-22 08:10:09,711 TRACE 1-thread-4 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 14] [start: 72243]
2020-01-22 08:10:09,739 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 13] [start: 70158]
2020-01-22 08:10:09,757 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 18] [start: 68923]
2020-01-22 08:10:09,767 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 4] [start: 69510]
2020-01-22 08:10:09,776 TRACE l-worker-7 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 10] [start: 70715]
2020-01-22 08:10:09,777 TRACE -worker-11 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 15] [start: 67380]
2020-01-22 08:10:09,784 TRACE -worker-15 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 0] [start: 71372]
2020-01-22 08:10:09,791 TRACE l-worker-3 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 5] [start: 70983]
2020-01-22 08:10:19,798 TRACE l-worker-5 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 8] [start: 71868]
2020-01-22 08:10:19,806 TRACE -worker-13 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 7] [start: 71230]
2020-01-22 08:10:19,810 TRACE -worker-11 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 9] [start: 69797]
2020-01-22 08:10:19,811 TRACE l-worker-9 o.k.r.RecordRepository     Consume Newest [topic: internal_test] [partition: 3] [start: 70138]
2020-01-22 08:10:29,827 DEBUG 1-thread-4 o.k.r.RecordRepository     30191 ms -> Consume with options [RecordRepository.Options(topic=internal_test, size=3, after={}, sort=NEWEST, partition=null, timestamp=null)]
2020-01-22 08:10:29,859 DEBUG 1-thread-4 org.kafkahq.utils.Logger   9 ms -> Describe Topic Config [internal_test]
2020-01-22 08:10:29,873 INFO  pGroup-1-6 org.kafkahq.log.access     [Date: 2020-01-22T08:09:59.381221+01:00] [Duration: 30491 ms] [Url: GET /kafka-nonprod/topic/internal_test?sort=NEWEST HTTP/1.1] [Status: 200] [Ip: 0:0:0:0:0:0:0:1] [Length: 2461] [Port: 8050]
jorgheymans commented 4 years ago

The amount of volume in the topic is not related to this bug it seems. For a topic of 5k records i had to wait 35 seconds to get back only one record, that's a bit surprising.

PavelYadrov commented 1 year ago

Hello, I've faced with the same issue, but sort by newest fails even for topics with 200 records. Capture

In logs i can't see any errors, it loads records from all partitions (5 in my case) and then show execution time: [0;39m 30120 ms -> Consume with options [RecordRepository.Options(topic=test_topic, size=50, after={}, sort=NEWEST, partition=null, timestamp=null)]

Could you please help me with understanding why AKHQ fails to load even in topics with small amount of records. Note: with default OLDEST sorting it works just fine

tchiotludo commented 1 year ago

Seems that the reason can be found here