redpanda-data / console

Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.
https://redpanda.com
3.76k stars 346 forks source link

No partition watermark for the group's topic available error when loading consumer group #161

Closed JaeHyeokLee closed 3 years ago

JaeHyeokLee commented 3 years ago

I have a problem when loading consumer group and topic -> consumer menu in frontend. I used kowl helm chart with quay.io/cloudhut/kowl:master image.

I got this error in frondend

image

And I got this pod log.

{"level":"error","ts":"2021-01-06T07:11:40.353Z","msg":"no partition watermark for the group's topic available","group":"XXX","topic":"XXX"} {"level":"error","ts":"2021-01-06T07:11:40.353Z","msg":"Sending REST error","topic_name":"XXX","route":"/api/topics/XXX/consumers","method":"GET","status_code":500,"remote_address":"X.X.X.X","public_error":"Could not list topic consumers for requested topic","error":"failed to get consumer group lags: no partition watermark for the group's topic available"}

How can I fix this error?

weeco commented 3 years ago

Can you try the same with the latest docker image that built the master branch? We use a different Kafka library and might be solved (or provides maybe more insights) by switching to this lib/docker image.

JaeHyeokLee commented 3 years ago

@weeco I modified helm chart image pull policy(Always) and re-installed chart but got same error. Should I use older docker image for this issue? (fyi, I'm using kafka 2.5.1 with AWS MSK)

image
weeco commented 3 years ago

@JaeHyeokLee Can you try this docker image master-09f461d8?

Reinstalling the chart shouldn't change anything unless you use a different image tag there.

JaeHyeokLee commented 3 years ago

I changed chart values.yaml but got same error and pod log...

image:
  repository: quay.io/cloudhut/kowl
  pullPolicy: Always
  tag: master-09f461d8
weeco commented 3 years ago

Okay, then I assume this is not a fault with the Kafka library at least.

  1. Do you see any other warnings in the logs beforehand? (The problem is that it doesn't have any high or low offset for a partition cached, hence it can't calculate the consumer group lag)
  2. Do you use ACLs in your cluster? If so, does Kowl have the required permissions to get offsets?
  3. Does that happen on all consumer groups or just a few / specific ones?
  4. Can you see the partition watermarks when you click on one of the topics in the topics list? (There's a partitions tab that should also list the partition offsets)
JaeHyeokLee commented 3 years ago
  1. I got no warning or info log, but this 2 error logs. This error log appear at consumer group tab and topic->consumer menu. No log was collected at brokers, ACL, Schema Registry tab because these tabs work well. (the consumer group is our Kafka cluster's sink connector)

    {"level":"error","ts":"2021-01-06T07:11:40.353Z","msg":"no partition watermark for the group's topic available","group":"XXX","topic":"XXX"}
    {"level":"error","ts":"2021-01-06T07:11:40.353Z","msg":"Sending REST error","topic_name":"XXX","route":"/api/topics/XXX/consumers","method":"GET","status_code":500,"remote_address":"X.X.X.X","public_error":"Could not list topic consumers for requested topic","error":"failed to get consumer group lags: no partition watermark for the group's topic available"}
  2. I don't use any ACL and I can access ACL tab.

    image
  3. This 500 error happens every consumer groups.

  4. yes. but when I move consumer tab, got same error

    image
weeco commented 3 years ago

@JaeHyeokLee Thanks for the additional information. So here's how we get calculate the consumer group lags for each group. If you don't understand Go the comments for each step should also give you an idea how it works: https://github.com/cloudhut/kowl/blob/d2945ae5193671ff1be41d833597ab53bf77341f/backend/pkg/owl/consumer_group_lag.go#L61-L154

It seems like that your group has one or more active offsets on a topic/partition whose offsets can't be matched. Unfortunately you can't figure out what topics/partitions this could be and I don't have an explanation how this would be possible (if you delete a topic all group offsets for that topic will be cleared). I wonder if this is something Amazon MSK specific? I will add additional logging so that you know what topic and partition is missing there. Once you got this, you can try to open the partitions list for this specific topic.

I'll ping you once the logging is enhanced.

weeco commented 3 years ago

@JaeHyeokLee I took another look. In fact the topic is already logged. Did you check the partition page for exactly the topic that has been printed in the logs? It uses the exact same function to fetch the partition offsets from the brokers. I couldn't find an issue in the logic itself.

Is there anything special? Did you for instance recently change the number of partitions for this topic? Might be tough to figure it out without direct Kafka access or some hint. If you can provide access to a sandbox environment where I could replicate this issue, I'm happy to look into it.

buneyev commented 3 years ago

I have the same problem. Set kowl docker image tag to v1.2.2 helped.

weeco commented 3 years ago

@alex-boon I think the involved code hasn't changed for some time and I've already had someone report that issue running v1.2.2 via Discord. What version did you run before?

Unfortunately I still can't reproduce this issue, hence I'm a bit clueless why this is happening.

JaeHyeokLee commented 3 years ago

@alex-boon thanks! It works! @weeco thank you for your help. I think this is not MSK issue because there isn't any error with other kafka ui tool. I could see entire information about kafka at other kafka ui tools. And I wonder why there isn't any error at partition menu and why there is an error in consumer menu🤔

weeco commented 3 years ago

@JaeHyeokLee What version did you upgrade or downgrade from? Then I'll try to compare the code

JaeHyeokLee commented 3 years ago

I got this error in quay.io/cloudhut/kowl:master-09f461d8, and it works well in quay.io/cloudhut/kowl:v1.2.2

weeco commented 3 years ago

Thank you all for the information. I pushed the fix to master and a new image is available that contains the fix: quay.io/cloudhut/kowl:master-e884fc06

JaeHyeokLee commented 3 years ago

@weeco I updated docker image to master but got this rendering error.

Type: TypeError

Message: Expected a finite number, got undefined: undefined

Stack: https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:515810 https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:310361 Yi@https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:1237572 Aa@https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:1245571 lu@https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:1276512 cu@https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:1276437 Zc@https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:1273446 Zc@[native code] https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:1224830 https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:1300411 $o@https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:1224776 qo@https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:1224711 eu@https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:1273737 Xe@https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:24350 We@https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:20796 https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:23277 https://my_kowl_url/static/js/2.ab6d81be.chunk.js:2:26114

Components: in Cell in tr in BodyRow in tbody in Unknown in table in div in div in Unknown in Unknown in div in Ne in div in div in f in div in Er in v in i in h in u in s in a in div in t in div in t in div in t in l in div in a in div in ForwardRef in a in t in t in Pi in ki in Qi in a in main in j in Content in section in n in Layout in s in section in n in Layout in t in t in a in a in t in t

Environment: NODE_ENV : production GIT_SHA : 284eb140e520ee647f8801992c54e7ad05b3c0c3 GIT_REF : v1.2.2 TIMESTAMP: 1606146599 appName : Kowl

Location: 0 : [object Object] 1 : [object Object] 2 : [object Object] 3 : [object Object] remove : function (e){var t=this.indexOf(e);return-1!==t&&(this.splice(t,1),!0)} removeAll: function (e){for(var t=0,a=0;a<this.length;a++)e(this[a])&&(this.splice(a,1),t++,a--);return t} first : function (e){var t,a=Object(w.a)(this);try{for(a.s();!(t=a.n()).done;){var n=t.value;if(e(n))return n}}catch(ll){a.e(ll)}finally{a.f()}} last : function (e){for(var t=this.length-1;t>=0;t--)if(!e||e(this[t]))return this[t]} sum : function (e){return this.reduce((function(t,a){return t+e(a)}),0)} max : function (e){return this.reduce((function(t,a){return Math.max(t,e(a))}),0)} any : function (e){var t,a=Object(w.a)(this);try{for(a.s();!(t=a.n()).done;){if(e(t.value))return!0}}catch(ll){a.e(ll)}finally{a.f()}return!1} all : function (e){var t,a=Object(w.a)(this);try{for(a.s();!(t=a.n()).done;){if(!e(t.value))return!1}}catch(ll){a.e(ll)}finally{a.f()}return!0} groupBy : function (e){var t=new Map;return this.forEach((function(a){var n=e(a),r=t.get(n);r?r.push(a):t.set(n,[a])})),t} groupInto: function (e){var t=this.groupBy(e),a=[];return t.forEach((function(e,t){a.push({key:t,items:e})})),a} distinct: function (e){var t=e||function(e){return e},a=new Set,n=[];return this.forEach((function(e){var r=t(e);a.has(r)||(a.add(r),n.push(e))})),n} pushDistinct: function (){for(var e=arguments.length,t=new Array(e),a=0;a<e;a++)t[a]=arguments[a];for(var n=0,r=t;n<r.length;n++){var i=r[n];this.includes(i)||this.push(i)}} genericJoin: function (e){for(var t=[],a=1;a<this.length;a++){var n=this[a-1],r=e(n,this[a],a);t.push(n),t.push(r)}return t.push(this[this.length-1]),t} toMap : function (e,t){var a,n=new Map,r=Object(w.a)(this);try{for(r.s();!(a=r.n()).done;){var i=a.value,o=e(i),l=t(i);n.set(o,l)}}catch(ll){r.e(ll)}finally{r.f()}return n} joinStr : function (e){var t,a="",n=Object(w.a)(this);try{for(n.s();!(t=n.n()).done;){var r=t.value;null!==r&&void 0!==r&&""!==r&&(0==a.length?a=r:a+=e+r)}}catch(ll){n.e(ll)}finally{n.f()}return a}

weeco commented 3 years ago

Hmm I'm sorry that you are running into another issue @JaeHyeokLee , that's not the user experience we'd like to provide. I'd like to fix this for as well. This time I can't seem to reproduce this issue. What endpoint/page did you open?

Anything specific in the JSON response (you can see the response in the browser's developer tools)? If you want to you can share information privately via our Discord server as well.