shunfei / DCMonitor

Data Center monitor, included zookeeper, kafka, druid
MIT License
246 stars 91 forks source link

kafka message rate data doesn't show up in UI #42

Closed zousheng closed 8 years ago

zousheng commented 8 years ago

Hi,

May I ask for a favor?

Today i deployed this project, and found below error, please advise if I configred wrong or this is a bug? thanks

2016-04-29 12:12:18,149 ERROR [Thread-3] kafka.consumer.TopicCount$ - error parsing consumer json string events kafka.common.KafkaException: error constructing TopicCount : events at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:76) at kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:671) at kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:670) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.utils.ZkUtils$.getConsumersPerTopic(ZkUtils.scala:670) at kafka.utils.ZkUtils.getConsumersPerTopic(ZkUtils.scala) at com.sf.monitor.kafka.KafkaInfos.getActiveTopicMap(KafkaInfos.java:299) at com.sf.monitor.kafka.KafkaStats.fetchKafkaPartitionInfos(KafkaStats.java:32) at com.sf.monitor.kafka.KafkaStats.fetchCurrentInfos(KafkaStats.java:24) at com.sf.monitor.kafka.KafkaInfoFetcher$1.run(KafkaInfoFetcher.java:41) 2016-04-29 12:12:18,149 ERROR [Thread-3] com.sf.monitor.kafka.KafkaInfos - could not get consumers for group messagegroup kafka.common.KafkaException: error constructing TopicCount : events at kafka.consumer.TopicCount$.constructTopicCount(TopicCount.scala:76) at kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:671) at kafka.utils.ZkUtils$$anonfun$getConsumersPerTopic$1.apply(ZkUtils.scala:670) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at kafka.utils.ZkUtils$.getConsumersPerTopic(ZkUtils.scala:670) at kafka.utils.ZkUtils.getConsumersPerTopic(ZkUtils.scala) at com.sf.monitor.kafka.KafkaInfos.getActiveTopicMap(KafkaInfos.java:299) at com.sf.monitor.kafka.KafkaStats.fetchKafkaPartitionInfos(KafkaStats.java:32) at com.sf.monitor.kafka.KafkaStats.fetchCurrentInfos(KafkaStats.java:24) at com.sf.monitor.kafka.KafkaInfoFetcher$1.run(KafkaInfoFetcher.java:41) 2016-04-29 12:12:20,222 WARN [Thread-3] com.sf.monitor.kafka.KafkaStats - kafka - topic:[events],consumer:[kefu-statistic] - consum lag: current[1904],threshold[200], topic lag illegal!

sundy-li commented 8 years ago

Hi, @zousheng Could you please tell me the kafka and zookeeper version and paste your config.json here?

zousheng commented 8 years ago

Sure, I am using kafka 0.8.2.2, and zk_version 3.4.6-1569965, built on 02/20/2014 09:09 GMT

zousheng commented 8 years ago

Thanks for your quick prompt, so supportive

sundy-li commented 8 years ago

From the log above, it seems like the consumer messagegroup has no topic in zk path. Try this

` cd $ZK_ROOT

bin/zkCli.sh ll /consumers/messagegroup/owners `

If this messagegroup is never used, you may rmr /consumers/messagegroup and restart the Dcmonitior.

And later I will fix the code to suit this special situation.

zousheng commented 8 years ago

it exists [zk: ]ls /consumers/messagegroup/owners -l [events]

sundy-li commented 8 years ago

The events string is not valid, is dcmonitor working fine? If so, you could ignore this error, cause the format of /consumers/messagegroup is not the normal, so the zkclient could not parse it...

zousheng commented 8 years ago

This events topic is created manually, and another service will write data into this topic, so it will be used. I tried 2 version DCmonitor, neither of them shows up. I will check the events string, by default, it's a json string, so you mean the json is not valid?

zousheng commented 8 years ago

ls /consumers/messagegroup/owners/events , it has 500 partitions, you mean this partition list is not valid? [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 300, 422, 301, 302, 423, 303, 424, 304, 425, 426, 305, 306, 427, 307, 428, 308, 429, 309, 430, 310, 431, 311, 432, 312, 433, 434, 313, 435, 314, 315, 436, 316, 437, 438, 317, 318, 439, 319, 440, 441, 320, 442, 200, 321, 443, 322, 201, 444, 323, 202, 445, 324, 203, 204, 325, 446, 447, 205, 326, 327, 448, 206, 328, 449, 207, 329, 208, 209, 450, 330, 451, 452, 331, 210, 453, 211, 332, 454, 333, 212, 455, 334, 213, 456, 335, 214, 336, 215, 457, 337, 458, 216, 338, 459, 217, 339, 218, 219, 460, 340, 461, 462, 341, 220, 463, 342, 221, 100, 343, 464, 222, 101, 344, 223, 465, 102, 345, 466, 103, 224, 346, 104, 467, 225, 347, 468, 226, 105, 348, 227, 469, 106, 349, 107, 228, 229, 108, 109, 470, 350, 471, 472, 351, 230, 352, 473, 110, 231, 474, 353, 111, 232, 354, 233, 475, 112, 234, 355, 476, 113, 356, 477, 235, 114, 478, 357, 115, 236, 479, 358, 237, 116, 117, 238, 359, 239, 118, 119, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 480, 360, 481, 361, 240, 482, 483, 362, 120, 241, 363, 484, 121, 242, 364, 485, 122, 243, 123, 244, 365, 486, 366, 124, 245, 487, 367, 125, 488, 246, 368, 489, 126, 247, 369, 248, 127, 249, 128, 129, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 490, 370, 491, 250, 492, 371, 493, 251, 372, 130, 373, 131, 494, 252, 253, 495, 374, 132, 496, 254, 375, 133, 134, 255, 376, 497, 256, 377, 498, 135, 378, 499, 257, 136, 379, 258, 137, 259, 138, 139, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 380, 260, 381, 261, 382, 140, 383, 262, 141, 263, 384, 142, 264, 385, 143, 265, 386, 144, 266, 145, 387, 388, 267, 146, 389, 268, 147, 269, 148, 149, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 390, 270, 391, 392, 271, 150, 393, 151, 272, 394, 273, 152, 395, 274, 153, 396, 275, 154, 276, 397, 155, 398, 156, 277, 399, 157, 278, 279, 158, 159, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 280, 160, 281, 282, 161, 283, 162, 284, 163, 285, 164, 165, 286, 166, 287, 288, 167, 289, 168, 169, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 290, 291, 170, 171, 292, 293, 172, 294, 173, 295, 174, 296, 175, 297, 176, 298, 177, 299, 178, 179, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

zousheng commented 8 years ago

Would you suggest which string is not valid and how can I list it,then we can check why it's not valid. Thanks

zousheng commented 8 years ago

dcmonitor site can be viewed, only the kafka analysis part, kafka produce rate and consumer rate data is 0.

sundy-li commented 8 years ago

`

zk hold the kafka offset in the path, try get /consumers/${consumer}/offsets/${topic}/0 such as

get /consumers/messagegroup/offsets/events/0 get /consumers/messagegroup/offsets/events/0

`

zousheng commented 8 years ago

get /consumers/xxxxx/offsets/events/0 10 cZxid = 0x10000084e ctime = Fri Apr 29 00:02:55 CST 2016 mZxid = 0x1000031f2 mtime = Fri Apr 29 14:29:02 CST 2016 pZxid = 0x10000084e cversion = 0 dataVersion = 10 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 2 numChildren = 0

zousheng commented 8 years ago

Is the error related with the client driver? maybe it doesn't support kafka 0.8.2.2?

sundy-li commented 8 years ago

Seems to be normal from the information, we are using kafka server 0.8.2.1 and zk 3.4.6-1569965, monitor works fine, could you please modify the pom.xml to 0.8.2.2 and build && restart it again?

I will test wether the partition number is too large for ZkUtil to get.

zousheng commented 8 years ago

Just found another error when usign chrome debug mode, the request returns 500 Internal server Error.

Please see below errors, is this related?

org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:973) org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:863) javax.servlet.http.HttpServlet.service(HttpServlet.java:646) org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:837) javax.servlet.http.HttpServlet.service(HttpServlet.java:727) org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77) org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:108)

root cause

retrofit.RetrofitError: 拒绝连接
retrofit.RetrofitError.networkError(RetrofitError.java:27)

zousheng commented 8 years ago

com.sf.monitor.utils.PrometheusUtils.getEvents(PrometheusUtils.java:81) com.sf.monitor.kafka.KafkaStats.getTrendConsumeInfos(KafkaStats.java:121) com.sf.monitor.controllers.KafkaController.topicDetail(KafkaController.java:143) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:497) org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:215) org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:132) org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:104) org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:749) org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:689) org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:83) org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:938) org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:870) org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:961) org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:863) javax.servlet.http.HttpServlet.service(HttpServlet.java:646) org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:837) javax.servlet.http.HttpServlet.service(HttpServlet.java:727) org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77) org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:108)

note The full stack trace of the root cause is available in the Apache Tomcat/7.0.52 logs.


Apache Tomcat/7.0.52

zousheng commented 8 years ago

this jar also needs apache tomcat?

zousheng commented 8 years ago

I will try to change the pom and restart it, thx

sundy-li commented 8 years ago

Tomcat is emberd in spring mvc, just run it, check the promethus url is healthy.

zousheng commented 8 years ago

Same error after rebuilding the cod, when I chose analyse start time, end time, and tick "Go" button, the "/kafka/detail" api returns 500, the error is as shown above.

zousheng commented 8 years ago

Sorry I didn't setup Prometheus, maybe this is the error, I will set it up first, my bad.

zousheng commented 8 years ago

After setting up Prometheus, the analysis graph shows up, but the message rate stills shows" 0 msg/s over 2506 seconds ", the graph shows the lag, logsize, offsets. the 0 msg/s value is always 0.

sundy-li commented 8 years ago

Because we calculate the rate by diffs of the logsize and offsets, you should produce the msg to the topic and the consumer consumes the msgs, it will show the msg rate every 5 seconds

zousheng commented 8 years ago

got it, thx very much, it's working now. Have a good weekend!

sundy-li commented 8 years ago

never mind ~