skydive-project / skydive

An open source real-time network topology and protocols analyzer
https://skydive.network
Apache License 2.0
2.68k stars 404 forks source link

k8s probe producing errors and UI not working #2367

Closed sagor999 closed 1 year ago

sagor999 commented 3 years ago

Hi,

I am trying to deploy skydive to our medium sized kubernetes cluster. But so far no luck. I am seeing this errors when k8s probe is enabled:

Failed to insert entry 6: &{mapper_parsing_exception failed to parse field [Metadata.K8s.Ports.targetPort] of type [long] in document with id '215ed456-31de-474a-99bd-0d429f847291'. Preview of field's value: 'http'     false map[reason:For input string: "http" type:illegal_argument_exception] [] [] []   <nil>}

Seems like skydive might not be taking into account that some ports can have string name, instead of long for port number.

Also this error:

 Failed to insert entry 2: &{illegal_argument_exception Document contains at least one immense term in field="Metadata.K8s.Extra" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[83, 68, 82, 122, 83, 85, 70, 66, 81, 85, 70, 66, 81, 8, 70, 68, 76, 121, 116, 53, 79, 87, 69, 51, 84, 50, 107, 121, 99, 70[]...', original message: bytes can be at most 32766 in length; got 62316     false map[reason:max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 62316 type:max_bytes_length_exceeded_exception[] [] [] []   <nil>}                                                                  

Not sure what to do with this one.

Also when I open web UI, I can see my cluster there. But when I attempt to expand it, web UI just hangs. Is there any setting that can be tweaked for that?

using skydive 0.27.0

Thank you!

lebauce commented 3 years ago

@sagor999 It seems the first bug you are hitting is related to service ports : a first service was discovered with the port "http" then an other service was discovered but with the port as a numerical value. This causes elasticsearch - that you seem to be using as storage - to fail to index the new service as the types for "port" conflict.

The second one is because one field is way to big. It seems there is a elasticsearch ignore_above mapping parameter we can use to tell ES to ignore it.

These 2 bugs should not cause the web UI to hang. Could you please use the new web UI - much more scalable - to see if it helps ? To do so, just use http//localhost:8082/ui_v2

I'll try to write a patch for the 2 bugs, hopefully today. Could you please give it a try ?

lebauce commented 3 years ago

I did push this PR : https://github.com/skydive-project/skydive/pull/2368

If you could give it a try, that would really be appreciated. Thanks !

sagor999 commented 3 years ago

@lebauce how do I test those changes? I think that PR have not been merged yet? Was wondering if I can just use latest image to test those changes.

Also, I tried to access ui_v2 but I am getting 404 error. Do I need to add something into config to enable it?

sagor999 commented 3 years ago

Edit: Used latest image and that one has ui_v2. Confirmed: it doesn't hang!

lebauce commented 3 years ago

@sagor999 Thanks for giving the new UI a try and sorry for the delay. I built a binary of my PR : http://ci-logs.skydive.community/builds/skydive if you could try too. Thanks !

sagor999 commented 3 years ago

@lebauce sorry for slow reply. I deploy skydive as container, so cannot use just executable. I can wait until your fix gets into official container image to give it a try though.

lebauce commented 3 years ago

@sagor999 The PR was merged a few weeks ago but I completely forgot to ping on this issue, sorry. The latest official container image should contain the fix

sagor999 commented 1 year ago

Closing as resolved.