openconfig / gnmi

gRPC Network Management Interface
Apache License 2.0
464 stars 196 forks source link

Need help with subscribe example: Error "received bogus greeting from client" #15

Closed nileshkhambekar closed 6 years ago

nileshkhambekar commented 6 years ago

Hi,

I am having trouble running the GNMI client/sever example. I tried to trace the issue: The details are below. It seems I am missing something.

I am looking for sample commands for client and server to go with the uploaded example-config.pb.txt

Please see below what I have been trying, the issue I am seeing, and related debugging. Please help to share if you have any general suggestions in this regard.

Many Thanks for your time and help! Best Regards, Nilesh

Here are two versions of example-configs I have been trying to get working $ cat example-config.pb.txt target: "foo" client_type: GRPC_GNMI

$ cat example-config-cred.pb.txt target: "foo" client_type: GRPC_GNMI credentials: {username: "grpc", password: "grpc"}

I run server as follows: ./cmd --config=example-config.pb.txt --text --port=57400 ./cmd --config=example-config.pb.txt --text --port=57400 --server_key ~/z/keys/server.key --server_crt ~/z/keys/server.crt --ca_crt ~/z/keys/ca.crt [I am new to golang; I did 'go install' in directory server.go; the binary in the $GOPATH/bin is cmd.]

I tried running client in several ways ./gnmi_cli --address localhost:57400 --query state/port[port-id=]/statistics --with_user_pass -qt s -dt p -insecure ./gnmi_cli --address localhost:57400 --query state/port[port-id=]/statistics --with_user_pass -qt s -dt p --ca_crt ~/z/keys/ca.crt --client_crt ~/z/keys/client.crt --client_key ~/z/keys/client.key

For client, I tried changing query to everything: ./gnmi_cli --address localhost:57400 --query * --with_user_pass -qt s -dt p -insecure

Here is what issue the code is running into with the above config-files, server commands, client commands: ./cmd --config=example-config.pb.txt --text --port=57400 ./gnmi_cli --address localhost:57400 --query state/port[port-id=*]/statistics --with_user_pass -qt s -dt p -insecure

  1. The client is able to connect to the server. Under wireshark I do see packets being exchanged between the client and server.

  2. I have not been able to get the verbose logs working. (a side question is how to enable verbose logs?) so I added several debug prints in the GNMI and underlying GRPC source code. I am attaching a file, if you would like to see the trace with my debug-prints. Summarizing, under GRPC, it goes inside serveHTTP2Transport() from grpc/server.go and then calls serveStreams(), and then HandleStreams() from http2_server.go In here, it closes the connection from the condition preface != clientPreface errorf("transport: http2Server.HandleStreams received bogus greeting from client: %q", preface)

  3. Under Wireshark, I see that the server is resetting the two (Guess, one connection is for GNMI based channel and other one is Openconfig based channel as mentioned in client documentation) connections.

Next, I used the second config file with credentials as well, and there is no change in behavior.

Next, I used certificates, the server is closing the connection ./cmd --config=example-config.pb.txt --text --port=57400 --server_key ~/z/keys/server.key --server_crt ~/z/keys/server.crt --ca_crt ~/z/keys/ca.crt ./gnmi_cli --address localhost:57400 --query state/port[port-id=*]/statistics --with_user_pass -qt s -dt p --ca_crt ~/z/keys/ca.crt --client_crt ~/z/keys/client.crt --client_key ~/z/keys/client.key

In this case, the handleRawConn() returns with error (from the very first if condition which says Handshake failed) from grpc/server.go

Again, I think, I may be missing something very basic and if I could get an example 1) sample config file 2) what parameters to pass to server 3) what parameters to pass to client to get the GitHub code working on my local PC working, it would be great!! Really appreciate your help and time!

Thanks, Best Regards, Nilesh

grpc_client_server_connection_errors.txt

awly commented 6 years ago

Hi Nilesh,

I'll try to reproduce it locally when I get some free time this week. Just a few quick comments:

nileshkhambekar commented 6 years ago

Hi Andrew,

Many thanks for your response!

we use the fake server and gnmi_cli for distinct purposes and have never ran cli against this server, so apologies for usage confison; will try to add proper instructions as soon as possible Oh.. which client is recommended? Does this client talk GNMI?

Thanks!! Best Regards, Nilesh

On Tue, Nov 7, 2017 at 3:21 PM, Andrew Lytvynov notifications@github.com wrote:

Hi Nilesh,

I'll try to reproduce it locally when I get some free time this week. Just a few quick comments:

  • when you use gnmi_cli, TLS is required. So --server_key/crt flags to testing/fake/gnmi/cmd are required
  • you should pass either --with_user_pass or --client_key/crt to gnmi_cli, not both. They are distinct forms of authentication
  • the errors you are seeing likely have something to do with TLS/authentication on gRPC layer
  • we use the fake server and gnmi_cli for distinct purposes and have never ran cli against this server, so apologies for usage confison; will try to add proper instructions as soon as possible

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openconfig/gnmi/issues/15#issuecomment-342657228, or mute the thread https://github.com/notifications/unsubscribe-auth/Af4qxOHIiJYPw9lucJmy-i6Cu3Dv594Oks5s0OXrgaJpZM4QVOav .

awly commented 6 years ago

Sorry, I wasn't clear. gnmi_cli is the recommended client and it does speak gNMI. I was just clarifying that we never used it against fake server, which is why this is so confusing and not documented.

I finally got a chance to try this out and indeed, it's really hard to make this work:

Planned list of actions:

Also, we could probably come up with a much simpler fake gNMI server implementation that just sends a fixed list of updates and takes a JSON config. The testing/fake/gnmi/* code is meant more as a simulation of a device with infinite stream of dummy updates, not an example of server implementation.

nileshkhambekar commented 6 years ago

Hi Andrew,

Thank you very much for your time! Looks like at this point there is some work to be done to make the server-client run together as you have identified. This information is very useful for my debugging. I will keep you posted.

In the meanwhile, if you have any update, please help to share.

Many Thanks! Best Regards, Nilesh

On Wed, Nov 8, 2017 at 4:16 PM, Andrew Lytvynov notifications@github.com wrote:

Sorry, I wasn't clear. gnmi_cli is the recommended client and it does speak gNMI. I was just clarifying that we never used it against fake server, which is why this is so confusing and not documented.

I finally got a chance to try this out and indeed, it's really hard to make this work:

  • fake gNMI server only supports TLS client authentication, not username/password
  • fake gNMI server can't keep TLS but ignore client authentication
  • fake gNMI server needs a list of hardcoded updates in config, otherwise it closes RPC immediately
  • gnmi_cli doesn't fail fast on TLS handshake or authentication errors, it waits for a timeout

Planned list of actions:

  • allow fake gNMI server to run without verifying client certs
  • allow fake gNMI server to start without --ca_crt
  • add a tool to generate fake gNMI server config
  • (longer term) handle user/pass authentication in fake gNMI server
  • investigate why gnmi_cli blocks on permanent error

Also, we could probably come up with a much simpler fake gNMI server implementation that just sends a fixed list of updates and takes a JSON config. The testing/fake/gnmi/* code is meant more as a simulation of a device with infinite stream of dummy updates, not an example of server implementation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openconfig/gnmi/issues/15#issuecomment-343005593, or mute the thread https://github.com/notifications/unsubscribe-auth/Af4qxGI8Uvxh6eCnuFRvZQcG5l1ca0oeks5s0kR2gaJpZM4QVOav .

awly commented 6 years ago

I update the fake server, added a tool to help generate config files and update the README with specific examples: https://github.com/openconfig/gnmi/blob/master/testing/fake/gnmi/cmd/fake_server/README.md

Nilesh, could you try it out and see if it works on your side?

nileshkhambekar commented 6 years ago

Hi Andrew,

Thanks a ton!! I am able to get the server working! yay! Really appreciate your help and time!

Best Regards, Nilesh

Occasionally, the client does not get any data from the server. I see that in this case the underlying GRPC server is bailing out in the handleStreams() function. I am looking into the difference between parameters passed in the working vs not-working case.

On Thu, Nov 9, 2017 at 4:12 PM, Andrew Lytvynov notifications@github.com wrote:

I update the fake server, added a tool to help generate config files and update the README with specific examples: https://github.com/openconfig/ gnmi/blob/master/testing/fake/gnmi/cmd/fake_server/README.md

Nilesh, could you try it out and see if it works on your side?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openconfig/gnmi/issues/15#issuecomment-343333785, or mute the thread https://github.com/notifications/unsubscribe-auth/Af4qxOEQfX8aRLdlLuortlabmcsU4hyMks5s05TogaJpZM4QVOav .

jipanyang commented 6 years ago

I tried the example command. The same "gnmi_cli -a localhost:8080 -q '*' -logtostderr -insecure -qt s" command failed from time to time with "rpc error: code = Unimplemented". fake_server was running without any change.

jipan@sonic-build:~/work/src/github.com/openconfig/gnmi/cmd/gnmi_cli$ gnmi_cli -a localhost:8080 -q '' -logtostderr -insecure -qt s E1113 09:44:53.949449 24122 gnmi_cli.go:186] cli.QueryDisplay: sendQueryAndDisplay(ctx, {Addrs:[localhost:8080] Target: Replica:0 Discard:false Queries:[[]] Type:stream Timeout:30s NotificationHandler: ProtoHandler: Credentials: TLS:0xb8e420 Extra:map[]}, &{PollingInterval:30s StreamingDuration:0s Count:0 countExhausted:false Delimiter:/ Display:0x7d1160 DisplayPrefix: DisplayIndent: DisplayType:group DisplayPeer:false Timestamp: DisplaySize:false Latency:false ClientTypes:[]}): rpc error: code = Unimplemented desc = unknown service openconfig.OpenConfig

jipan@sonic-build:~/work/src/github.com/openconfig/gnmi/cmd/gnmi_cli$ gnmi_cli -a localhost:8080 -q '' -logtostderr -insecure -qt s E1113 09:44:56.128662 24129 gnmi_cli.go:186] cli.QueryDisplay: sendQueryAndDisplay(ctx, {Addrs:[localhost:8080] Target: Replica:0 Discard:false Queries:[[]] Type:stream Timeout:30s NotificationHandler: ProtoHandler: Credentials: TLS:0xb8e420 Extra:map[]}, &{PollingInterval:30s StreamingDuration:0s Count:0 countExhausted:false Delimiter:/ Display:0x7d1160 DisplayPrefix: DisplayIndent: DisplayType:group DisplayPeer:false Timestamp: DisplaySize:false Latency:false ClientTypes:[]}): rpc error: code = Unimplemented desc = unknown service openconfig.OpenConfig

jipan@sonic-build:~/work/src/github.com/openconfig/gnmi/cmd/gnmi_cli$ gnmi_cli -a localhost:8080 -q '*' -logtostderr -insecure -qt s { "a": { "b": 4 }, "b": { "c": "foo" } } { "a": { "b": 4 } } { "b": { "c": "foo" } } { "a": { "b": 4 } } { "b": { "c": "foo" } } { "b": { "c": "foo" } } { "b": { "c": "foo" } }

robshakir commented 6 years ago

@awly

This seems to happen because in client/client.go if an implementation returns a connect error then we just pass the error back up to the caller (https://github.com/openconfig/gnmi/blob/master/client/client.go#L196).

Should we add a flag to the client such that it says that the target is gnmi.proto only? We'd be able to deprecate it as openconfig.proto gets phased out.

awly commented 6 years ago

We can already pass --client_type=gnmi to gnmi_cli and client.Client.Subscribe calls take optional list of client types to try as well.

In theory NewImpl should check specific gRPC interface availability through reflection, but it's disabled for now https://github.com/openconfig/gnmi/blob/master/client/openconfig/client.go#L75

robshakir commented 6 years ago

Ah -- I missed that there was client_type. It doesn't make a huge amount of sense to fail completely in the case that >1 client type is specified, since if we're doing a subscribe against a target, then I'd expect that it the common scenario is that one service is supported. If both happen to be, do we really want the results twice?

@JipanYanga -- you should be able to use the flag above (--client_type=gnmi) to avoid this issue, such that the legacy OpenConfig service never is attempted against the fake.

r.

awly commented 6 years ago

The client library attempts to connect all requested clientTypes concurrently and only keeps the first one to succeed https://github.com/openconfig/gnmi/blob/master/client/register.go#L117 It will not query over multiple interfaces.

The target in question here does not support OpenConfig, but the client connection succeeds, because gRPC won't check that a given service is available until we actually send and RPC.

So the gnmi_cli calls fail sometimes, when OpenConfig client managers to dial first. Other times gNMI client dials first and the query succeeds.

robshakir commented 6 years ago

Right - I get that this is the case, but it just seems a strategy to introduce flaky connections, per the above issue. How does a caller ensure that they get a connection when they have an unknown target and want to exploit having the connection of both RPCs? It seems like even with some retry logic, I can end up with no connection for a device that supports gNMI?

If the only reliable way is to create two clients and try one over the other (e.g., try gNMI, if this errors then fallback to OpenConfig) can't we just encode this logic in the client library?

awly commented 6 years ago

Ideally, we should enforce reflection.

But, assuming that isn't trivial we could move the connection parallelization logic into BaseClient.Subscribe, and include the actual RPC call into it. It's a bit ugly to duplicate this responsibility IMO, but seems like that's the best we can do without reflection.

I'll make a change a bit later today.

robshakir commented 6 years ago

I know we had an issue with reflection and a particular target previously, I'm wondering whether that is a general issue - i.e., can we have a failed reflection lookup and still fallback to this logic (which could then maybe be left unchanged with a health warning)? I agree that the reflection would give us the nicest way to be able to deal with this -- and would certainly work here against the fake.

@gcsl and I had discussed adding reflection as a requirement. I think this is doable for most implementors with whom we've spoken -- it was an issue of client library lag (and some porting issues).

jipanyang commented 6 years ago

@robshakir @awly thanks, with --client_types=gnmi, I don't see the error any more.

I'm new to gnmi and a little confused by structure organization between reference and gnmi repositories, what are the major difference between gnmi.proto and openconfig.proto? I'm interested in trying out gnmi on SONiC nos for config and telemetry to see if it is a good fit there.

robshakir commented 6 years ago

Per the README in the reference repo, the openconfig.proto API is deprecated. This pre-dates the gNMI specification.

The reference repo stores the specification documents, this repo stores the gnmi.proto file and reference implementations surrounding it.