micro / go-micro

A Go microservices framework
https://go-micro.dev
Apache License 2.0
21.8k stars 2.34k forks source link

gossip registry #404

Closed vtolstov closed 5 years ago

vtolstov commented 5 years ago

I'm experiment with gossip registry and may be found some specific issue. When i'm start first service1 with specify gossip.Address("172.16.1.254:4223") and on other server service2 with registry.Addrs("172.16.1.254:4223") and gossip.Address("172.16.1.1:0")

i see that members equal 2 on both sides, but the second service does not registered in registry. But if i stop service1 and start it with registry.Addrs("172.16.1.254:4223") and gossip.Address("172.16.1.1:xxx") where xxx is port provided by service2 all works fine. So issue appears only on first service1 when it start first without other members.

vtolstov commented 5 years ago

also i don't see broadcast update messages in the first case

vtolstov commented 5 years ago

also then the service connected to already running registry service it receives updates from it, but i don't see any updates send to first service

vtolstov commented 5 years ago

root case of the issue, because service2 then connect broadcast sync event, and serivce2 receives all data from service1, but service1 don't receive any service info from service2

vtolstov commented 5 years ago

and LocalState func for connected service does not have any services data in channel, because registry not created when gossip join happening.

vtolstov commented 5 years ago

next investigation:

g.queue.QueueBroadcast(&broadcast{
  update: up,
  notify: nil,
})

this is not send service data when calls Register in gossip. I'm check this by providing channel to notify and check when read from it returned.

service1 that start first after boadcast returns, but service2 not.

vtolstov commented 5 years ago

@asim , gentle ping

asim commented 5 years ago

I do not have time to investigate this right now. Feel free to PR a fix.

vtolstov commented 5 years ago

nice, i think that enterprise SLA helps with such cases, can you write in enterprise repo you test system , so that other can understand risks and what you autotest for each commit?

vtolstov commented 5 years ago

now i have only one workaround, remove check for join in LocalState and in MergeRemoteState so after first pull/push service data updated on both sides.But this is very ugly. As i see in all cases broadcast not worked for me.

vtolstov commented 5 years ago

I'm write test case for gossip registry. And it works fine, also i'm try to run two micro services with the same registry params and service info not propagated to to each other. Does it possible that some issue present in micro/server code?

vtolstov commented 5 years ago

https://github.com/unistack-org/go-micro/blob/gossip/registry/gossip/gossip_test.go

vtolstov commented 5 years ago

am add to https://github.com/unistack-org/go-micro/blob/gossip/registry/gossip/gossip_test.go failed test case. Can you look @asim and say, whats wrong in TestServerRegistry ?

vtolstov commented 5 years ago

i found! Does it possible to add to server some option to not return to channel something when it fully started? Main problem that sometimes server started too quickly and not register in registry all the stuff.

So i'm check in go-micro repo file service_test.go and you use WaitGroup in After start to allow wait then server is fully started.

vtolstov commented 5 years ago

and this is not works for real world example node1

./tests --registry_address 172.16.1.254:4223 --broker_endpoint 172.16.1.254:4222   --dns_address 0.0.0.0:5353
2019/01/31 16:55:26 Registry Listening on 172.16.1.254:4223
2019/01/31 16:55:26 run org.unistack.sshkey
2019/01/31 16:55:26 Transport [http] Listening on [::]:38785
2019/01/31 16:55:26 Broker [stan] Listening on nats://172.16.1.254:4222
2019/01/31 16:55:26 Registering node: org.unistack.sshkey-23fb4462-ca7e-4910-a5f1-a48cd475c7bc
2019/01/31 16:55:31 total svcs 1
2019/01/31 16:55:31 svc: org.unistack.sshkey
2019/01/31 16:55:36 total svcs 1
2019/01/31 16:55:36 svc: org.unistack.sshkey
2019/01/31 16:55:38 [DEBUG] memberlist: Stream connection from=172.16.1.254:42218
2019/01/31 16:55:41 total svcs 1
2019/01/31 16:55:41 svc: org.unistack.sshkey
2019/01/31 16:55:46 total svcs 1
2019/01/31 16:55:46 svc: org.unistack.sshkey
2019/01/31 16:55:48 [DEBUG] memberlist: Stream connection from=172.16.1.1:60186
2019/01/31 16:55:51 total svcs 1
2019/01/31 16:55:51 svc: org.unistack.sshkey
2019/01/31 16:55:56 total svcs 1
2019/01/31 16:55:56 svc: org.unistack.sshkey
2019/01/31 16:56:01 total svcs 1
2019/01/31 16:56:01 svc: org.unistack.sshkey
2019/01/31 16:56:06 total svcs 1
2019/01/31 16:56:06 svc: org.unistack.sshkey
2019/01/31 16:56:09 [DEBUG] memberlist: Initiating push/pull sync with: 172.16.1.254:40007
2019/01/31 16:56:11 total svcs 1
2019/01/31 16:56:11 svc: org.unistack.sshkey

node2:

2019/01/31 16:56:18 svc: org.unistack.sshkey
2019/01/31 16:56:22 [DEBUG] memberlist: Initiating push/pull sync with: 172.16.1.1:39237
2019/01/31 16:56:23 total svcs 2
2019/01/31 16:56:23 svc: org.unistack.libvirt
2019/01/31 16:56:23 svc: org.unistack.sshkey
2019/01/31 16:56:28 total svcs 2
2019/01/31 16:56:28 svc: org.unistack.libvirt
2019/01/31 16:56:28 svc: org.unistack.sshkey
2019/01/31 16:56:33 total svcs 2
2019/01/31 16:56:33 svc: org.unistack.libvirt
2019/01/31 16:56:33 svc: org.unistack.sshkey
2019/01/31 16:56:38 total svcs 2
2019/01/31 16:56:38 svc: org.unistack.libvirt
2019/01/31 16:56:38 svc: org.unistack.sshkey
2019/01/31 16:56:39 [DEBUG] memberlist: Stream connection from=172.16.1.254:45240
2019/01/31 16:56:39 [DEBUG] memberlist: Stream connection from=172.16.1.1:35506
2019/01/31 16:56:43 total svcs 2
2019/01/31 16:56:43 svc: org.unistack.libvirt
2019/01/31 16:56:43 svc: org.unistack.sshkey
2019/01/31 16:56:48 total svcs 2
2019/01/31 16:56:48 svc: org.unistack.libvirt
2019/01/31 16:56:48 svc: org.unistack.sshkey
vtolstov commented 5 years ago

@asim i think that gossip registry must be die. Do you know that broadcast doing by udp, and so packet size limits to something like 1440 byte? Most of my services when marshal to json as you do inside gossip registry takes from 6000 to 15000, 23000...

In case of mdns registry you don't expose all data like in gossip.

vtolstov commented 5 years ago

i'm try to minimize sended data, but most of the time endpoint is too big. example

{"name":"org.unistack.sshkey","version":"0.0.0.1","metadata":null,"endpoints":[{"name":"SshkeyService.Create","request":{"name":"Ssh
keyCreateReq","type":"SshkeyCreateReq","values":[{"name":"name","type":"string","values":null},{"name":"data","type":"string","values":n
ull},{"name":"project","type":"string","values":null},{"name":"account","type":"string","values":null},{"name":"-","type":"","values":nu
ll},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"
response":{"name":"Sshkey","type":"Sshkey","values":[{"name":"uuid","type":"string","values":null},{"name":"account","type":"string","va
lues":null},{"name":"project","type":"string","values":null},{"name":"data","type":"string","values":null},{"name":"name","type":"string
","values":null},{"name":"fprint_md5","type":"string","values":null},{"name":"fprint_sha256","type":"string","values":null},{"name":"cre
ated_at","type":"int64","values":null},{"name":"updated_at","type":"int64","values":null},{"name":"enabled","type":"uint32","values":nul
l},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"
-","type":"int32","values":null}]},"metadata":{"stream":"false"}},{"name":"SshkeyService.Delete","request":{"name":"SshkeyDeleteReq","ty
pe":"SshkeyDeleteReq","values":[{"name":"uuid","type":"string","values":null},{"name":"project","type":"string","values":null},{"name":"
account","type":"string","values":null},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","typ
e":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"response":{"name":"Empty","type":"Empty","values":[{"name":"-",
"type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int3
2","values":null}]},"metadata":{"stream":"false"}},{"name":"SshkeyService.List","request":{"name":"SshkeyListReq","type":"SshkeyListReq"
,"values":[{"name":"project","type":"string","values":null},{"name":"account","type":"string","values":null},{"name":"fields","type":"[]
SshkeyListReq_Fields","values":[{"name":"SshkeyListReq_Fields","type":"SshkeyListReq_Fields","values":null}]},{"name":"meta","type":"Ssh
keyListReq_Meta","values":[{"name":"limit","type":"uint32","values":null},{"name":"offset","type":"uint32","values":null},{"name":"sort"
,"type":"string","values":null},{"name":"order","type":"string","values":null},{"name":"-","type":"","values":null},{"name":"-","type":"
[]uint8","values":null},{"name":"-","type":"int32","values":null}]},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","v
alues":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"response":{"name":"SshkeyListRsp","
type":"SshkeyListRsp","values":[{"name":"sshkeys","type":"[]Sshkey","values":[{"name":"Sshkey","type":"Sshkey","values":null}]},{"name":
"meta","type":"SshkeyListRsp_Meta","values":[{"name":"total","type":"int64","values":null},{"name":"-","type":"","values":null},{"name":
"-","type":"[]uint8","values":null},{"name":"-","type":"int32","values":null}]},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"metadata":{"stream":"false"}},{"name":"SshkeyService.Lookup","request":{"name":"SshkeyLookupReq","type":"SshkeyLookupReq","values":[{"name":"uuid","type":"string","values":null},{"name":"project","type":"string","values":null},{"name":"account","type":"string","values":null},{"name":"fields","type":"[]SshkeyLookupReq_Fields","values":[{"name":"SshkeyLookupReq_Fields","type":"SshkeyLookupReq_Fields","values":null}]},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"response":{"name":"Sshkey","type":"Sshkey","values":[{"name":"uuid","type":"string","values":null},{"name":"account","type":"string","values":null},{"name":"project","type":"string","values":null},{"name":"data","type":"string","values":null},{"name":"name","type":"string","values":null},{"name":"fprint_md5","type":"string","values":null},{"name":"fprint_sha256","type":"string","values":null},{"name":"created_at","type":"int64","values":null},{"name":"updated_at","type":"int64","values":null},{"name":"enabled","type":"uint32","values":null},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","valus":null},{"name":"created_at","type":"int64","values":null},{"name":"updated_at","type":"int64","values":null},{"name":"enabled","type":"uint32","values":null},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"metadata":{"stream":"false"}},{"name":"SshkeyService.Search","request":{"name":"SshkeySearchReq","type":"SshkeySearchReq","values":[{"name":"project","type":"string","values":null},{"name":"account","type":"string","values":null},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"response":{"name":"SshkeyListRsp","type":"SshkeyListRsp","values":[{"name":"sshkeys","type":"[]Sshkey","values":[{"name":"Sshkey","type":"Sshkey","values":null}]},{"name":"meta","type":"SshkeyListRsp_Meta","values":[{"name":"total","type":"int64","values":null},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":null},{"name":"-","type":"int32","values":null}]},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"metadata":{"stream":"false"}},{"name":"SshkeyService.Update","request":{"name":"SshkeyUpdateReq","type":"SshkeyUpdateReq","values":[{"name":"uuid","type":"string","values":null},{"name":"name","type":"string","values":null},{"name":"project","type":"string","values":null},{"name":"account","type":"string","values":null},{"name":"fields","type":"FieldMask","values":[{"name":"paths","type":"[]string","values":null},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":null},{"name":"-","type":"int32","values":null}]},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"response":{"name":"Sshkey","type":"Sshkey","values":[{"name":"uuid","type":"string","values":null},{"name":"account","type":"string","values":null},{"name":"project","type":"string","values":null},{"name":"data","type":"string","values":null},{"name":"name","type":"string","values":null},{"name":"fprint_md5","type":"string","values":null},{"name":"fprint_sha256","type":"string","values":null},{"name":"created_at","type":"int64","values":null},{"name":"updated_at","type":"int64","values":null},{"name":"enabled","type":"uint32","values":null},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"metadata":{"stream":"false"}},{"name":"Func","request":{"name":"Account","type":"Account","values":[{"name":"uuid","type":"string","values":null},{"name":"owner","type":"string","values":null},{"name":"zone","type":"string","values":null},{"name":"status","type":"string","values":null},{"name":"login","type":"string","values":null},{"name":"passw","type":"string","values":null},{"name":"perms","type":"string","values":null},{"name":"type","type":"string","values":null},{"name":"created_at","type":"int64","values":null},{"name":"updated_at","type":"int64","values":null},{"name":"settings","type":"string","values":null},{"name":"-","type":"","values":null},{"name":"-","type":"[]uint8","values":[{"name":"uint8","type":"uint8","values":null}]},{"name":"-","type":"int32","values":null}]},"response":null,"metadata":{"subscriber":"true","topic":"org.unistack.account"}}],"nodes":null}
vtolstov commented 5 years ago

also you mdns register also broken, because you pass endpoint in TXT record, that have limit 255 bytes as of RFC 4408

vtolstov commented 5 years ago

yes, txt records can be concatenated, but this is also limits to udp packet size

asim commented 5 years ago

If you have a good solution please propose or PR it. Otherwise you can disable adding endpoints when you register handlers https://godoc.org/github.com/micro/go-micro/server#InternalHandler

vtolstov commented 5 years ago

close as #411 merged