osrg / gobgp

BGP implemented in the Go Programming Language
https://osrg.github.io/gobgp/
Apache License 2.0
3.63k stars 690 forks source link

Cleaning up the APIs with the major version updated #1763

Open fujita opened 6 years ago

fujita commented 6 years ago

If you just use gobgp and gobgpd binaries, you can stop reading here. The next release (early July) would be the last release of 1.X version. However, 2.X (likely early September) should work as before (e.g. the configuration file).

This is the summary of the possible changes to the APIs

Removing the binary type in gRPC

As pointed at https://github.com/osrg/gobgp/issues/1457, for example, AddPath gRPC API uses the bgp-on-wire binary format for path attributes. Non go languages impossibly use this API because of this (you need to parse the binary yourself). Protobuf structures for all path attributes and bgp capabilities are used with the new gRPC APIs.

Clarifying the internal and public libraries

For example, I think that table/ code is for only internal usage but clearly there are some users who use it for their own applications. To make clear what libraries can be safely used by external applications, I use the golang feature with the directory layout change.

I move code for internal usage to internal/pkg/; e.g. table, config, zebra, etc. I move code for external usage to pkg; e.g. server/ and probably packet/.

api/ has protobuf structures and functions with minimum helper functions that introduce no external package dependency). I’ll try to apply a rule that api/ must not import packages within the gobgp repository, as proposed at https://github.com/osrg/gobgp/issues/1721

Also I try to follow the widely used directory layout (e.g. move gobgp and gobgpd to /cmd).
https://github.com/golang-standards/project-layout

Making the gRPC APIs only public APIs for simplicity

server/BgpServer struct has methods corresponding to the GRPC APIs. I’ll delete them; you access to BgpServer via the gRPC APIs. Probably, the struct has only two public methods like Start() and Stop(). If someone is interested in having something like gobgp-sdk-go at gitHub/osrg, please let me know.

Miscellaneous

There should be some minor changes to gRPC APIs like https://github.com/osrg/gobgp/pull/1436. GetROA() should use stream interface (because the response could be huge). I tend to switch to gogo/protobuf.

I’m sure that I miss something.

jeffbean commented 6 years ago

Awesome! Thanks for writing this up.

So my two cents here is the gRPC being the only API is not ideal.

What I would like to see is a library with a public interface that is similar to what we have today, but is also trimmed down to what is needed. Let me try to describe the differences between the RPC and Library API's and why we should have both.

My concern with limiting the API surface to the gRPC server is not only for the developer, but for security and safety. If the only way to manage the server is through gRPC than anyone can modify anything and implementing ACLs on top of BGP sounds like a pain. In my specific case this is not acceptable to run the gRPC server, I intentionally do not use the gRPC server inside this repository.

Everything else sounds great!

cstockton commented 6 years ago

I think you are on the right path here, these are great high level goals. The only thing I don't entirely understand is the server refactoring you mention:

Making the gRPC APIs only public APIs for simplicity server/BgpServer struct has methods corresponding to the GRPC APIs. I’ll delete them; you access to BgpServer via the gRPC APIs. Probably, the struct has only two public methods like Start() and Stop(). If someone is interested in having something like gobgp-sdk-go at gitHub/osrg, please let me k

What is the motive of those changes? I think there is a lot of value not just to users, but most of all to the GoBGP project to be able to spin up reliable but short lived BGP servers for use in your unit tests. I think once some of the changes in #1331 are made to accept context it would be much easier to test against. You could quickly spin it up for a unit test and then know it was shutdown before the next test began.

My thoughts are it would be great if the server package was broken it into smaller sub packages. This would force the various structures to cross package boundaries by public API. I believe that helps prevents things like race conditions and makes it easier to prove code is correct since it's not possible to access private fields. It also gives you a nice perspective on how your users talk to your API as well since you have to eat your own dog food so to speak. Spit balling this to find a balance between effort and value, I would say to break server into:

1) pkg/fsm - Contains the FSM implementation 2) pkg/peer - Contains the Peer implementation 3) pkg/server - Contains the Server implementation 4) pkg/server/grpcserver - This is what crosses from the api package into gobgp packages like table. The idea being that the GRPC server uses the pkg/server API to serve requests which I feel would impose design constraints that will improve reliability.

If you started by just separating the packages at first and only refactored what was needed to keep things running identically as they were before it would give a nice holistic view of what a public API would look like. Some issues would probably surface that would be best solved with context.Context, and having all the packages split up would make it easiest for small incremental changes (which to be clear I would be happy to help with if you want to delegate anything to me feel free to ask).

It might not make sense to do this, but it's food for thought. I think some of the challenges of contributions, features and refactoring may stem from the monolithic design of the GRPC / BGP server. Breaking them up may make it easier for people to contribute small measurable changes that are easy to prove correct.

fujita commented 6 years ago

@jeffbean @cstockton thanks a lot for the comments.

What is the motive of those changes?

I really want to minimize public APIs (and structures) to easily evolve the code later. For example, currently, BgpServer's public methods use various structures for arguments and returned values.

How about making BgpServer's public methods correspond to the gRPC APIs?

func (s *BgpServer) AddPath(ctx context.Context, arg *api.AddPathRequest) (*api.AddPathResponse, error)

You can create a BgpServer object and call these methods directly (without playing grpc). Also we could make these methods handle context properly if necessary.

cmd/gobgpd/main.go calls api.RegisterGobgpApiServer with a BgpServer object.

My thoughts are it would be great if the server package was broken it into smaller sub packages.

Might make sense but I'm not sure until I see the code. However, I have no intention of exporting such code to external applications; internal/pkg/ instead.

If you started by just separating the packages at first and only refactored what was needed to keep things running identically as they were before

Yeah, that's the plan. I'll try after the next release.

cstockton commented 6 years ago

I see, I think that is fair since it will give you more freedom while you're refactoring everything. One thing to consider designing the internal server API is that anyone who uses the GRPC api will need to test their code. Right now it takes a pretty hefty docker compose file to run unit tests for my GoBGP related code and it's kind of difficult to setup/tear down each unit test since the state is all shared. The test code needs test code to ensure it is correct.

So maybe once the private version solidifies a public version could be exported and cmd/gobgp{,d} would vendor that public version. I'm looking forward to seeing how everything evolves it sounds great so far, let me know if you want any help I am in the GoBGP slack as well. Thanks for the work you're doing.

fujita commented 6 years ago

https://github.com/osrg/gobgp/pull/1774

Finished the following.

No real refactoring yet. I tried to finish this with minimum code change. Let me know now if you have any concerns. I'll push it in a couple of days and start the real work.

cstockton commented 6 years ago

@fujita I some how missed this issues update, but I just looked at the api package and wanted to say fantastic work with changing the direction of the dependency graph. Thanks a lot for the work you did here, I appreciate it.

fujita commented 5 years ago

@cstockton all the native APIs for golang use structures in the api package so the api breakage should not happen. One thing that I'm not sure is how to design the native APIs that corresponding to the gRPC streaming APIs. The current design simply enables you to get events from a channel. https://github.com/osrg/gobgp/blob/master/docs/sources/lib.md We could design an API to register a handler for an event instead. Any opinions?

cstockton commented 5 years ago

@fujita I try to avoid returning channels with 'watch" api's since it forces you to define a bullet point list of semantics around cancellation and draining the channels. One thing that comes to mind in your examples is I don't see any context, leaving no way for the caller to signal they are finished other than a synchronous call to pm.Close(), making it difficult to stop while blocked in a receive. I imagine part of your design is influenced by the GRPC API you are using? I had a discussion with the GRPC authors about the difficult ergonomics of the streaming API in https://github.com/grpc/grpc-go/issues/2159 - unfortunately I failed to convey the challenges properly. To summarize it is very difficult to ensure message delivery in real world systems where requests can be canceled.

In general I think there are two types of streaming API's:

1) Streams over a bounded range of keys, i.e. getting all paths from global rib 2) Streams that watch a range of keys as long as the connection remains, i.e. watching neighbor or path changes

I've come to prefer a contextual iterator (func() bool, or rarely the VerbVisitor interface) with subtle variations for both because it allows a common pattern that covers all the edge cases. For the first scenario I prefer the pattern below because the common case is a sequential stream of values, so defining the semantics is simple as:

// GetPaths will block the caller and send the current global rib until the given
// context is done or fn returns false, whichever comes first.
//
// Note: here you don't really need to even describe that if iteration is halted by
// ctx being done that ctx.Err() is returned, it's just assumed. When fn returns false
// it's fair to just return nil, since it was control flow that was the signal. 
func (c *Client) EachGlobalRIB(ctx context.Context, fn func(*api.Path) bool) error {}

Since GetGlobalRIB is sequential and terminal users will typically call it with their current request or worker scoped context. They don't need to create a second context they cancel() just for halting iteration thanks because func(*api.Path) bool serves this purpose in the common case. But in the rare event of a connection error the context still covers cancellation. Note: You could drop the bool return value, but I usually compliment full-table ranges like the above with some form of ranged iteration with start or end keys (i.e. find a group of longer prefixes) where I may want to return false at the end boundary of a shorter prefix.

For the second case the above doesn't seem to feel as nice, because it may be a very long time between updates and their is no fixed boundary, so halting a search range is never a user requirement. Anytime I ever have used such an API it's always in some type of worker context, in fact that is what I do with GoBGP's peer monitor API. I have a worker that simply stays open as long as the program is running to update the health check system that drives metrics and alerting. For these I believe it's perfectly reasonable to depend on the incoming context to signal completion, since in 99% of cases you're not exiting until the app has received a SIGTERM and the top level context has been canceled anyways.

// MonitorPeers will block and send peer changes to the caller until the
// given context becomes done or a connection error occurs.
//
// Note: this function can only return non-nil errors, since it doesn't exit unless
// the connection dies or the ctx becomes done. This is what I expect because
// I can always check if the returned err = the current ctx.Err() if I have a case
// where I explicitly cancel (as long as the ctx.Err() does not get wrapped).
func (c *Client) MonitorPeers(ctx context.Context, fn func(*api.Peer)) error {}

I come to prefer regular functions over type Visitor interface { Visit } because it's less flexible and seems to always be accompanied with a type VisitorFunc func VisitT(*T) that ends up placing an unnecessary burden at every call site: c.MonitorPeers(pkgname.VisitorFunc(fn)) - where function are less verbose and I can make use of method values to nicely encapsulate doing work, i.e.:

type healthMonitor struct {
   c *client.Client
   h *health.Manager
}
func (hm *healthMonitor) peerMonitor(p *api.Peer) error {
   if p.BGP.Status == "UP" { hm.h.up() } else { hm.h.down() }
}
func (hm *healthMonitor) Work(ctx context.Context) error { 
  return hm.c.MonitorPeers(ctx, hm.peerMonitor)
}

To summarize the two most important things to me are:

1) Aborting work when context becomes done 2) Blocking the caller until no more events may occur (preferably after all request related resources have been released, i.e. goroutines related to serving the request are terminated)

If either of those 2 things do not hold true, generally unpleasant ergonomic are guaranteed to emerge eventually to supplement them.

fujita commented 5 years ago

@cstockton Thanks a lot!

About the second type of streaming API (i.e. watching), I think that your proposal is much better; consistent with the rest of the APIs and looks more handy. I changed the APIs in the following way:

func (s *BgpServer) MonitorPeer(ctx context.Context, r *api.MonitorPeerRequest, fn func(*api.Peer)) error func (s *BgpServer) MonitorTable(ctx context.Context, r *api.MonitorTableRequest, fn func(*api.Path)) error

About the first type of streaming API, you are talking about replacing the current ListSomething APIs such as?: https://godoc.org/github.com/osrg/gobgp/pkg/server#BgpServer.ListPath

Currently, the APIs return all in one shot but If you prefer the streaming API, I'll change all of them.

cstockton commented 5 years ago

@fujita Glad to help, the changes look great.

For the List* API I wasn't suggesting any changes, but I definitely prefer the callback style for the type of operations those are performing in general. It can get expensive allocating and copying to slice backings that callers often just iterate and discard, but since things like policies are not something users query often it's not a big deal. I would just avoid any type of ListPath() API's if any even exist, if they do you could always just add a few EachPath functions and leave the current API.

Small nit: if you decide to leave the list functions, it may be worth considering deferring the slice backing allocation until you have the size:

func (s *BgpServer) ListPolicy(ctx context.Context, r *api.ListPolicyRequest) ([]*api.Policy, error) {
    var l []*api.Policy
    s.mgmtOperation(func() error {
        v := s.policy.GetPolicy(r.Name)
        l = make([]*api.Policy, len(v))
        copy(l, v)
        return nil
    }, false)
    return l, nil
}
fujita commented 5 years ago

@cstockton How about the following? https://github.com/osrg/gobgp/pull/1889

I feel that supporting context instead of ignoring it and being consistent with the gRPC streaming API is nice rather than returning everything in one shot.

Thanks for pointing out the slice allocation issue. I put some fixes in the above PR but there are still lots, I suppose. After the next major release, I try to improve those internal stuff.

cstockton commented 5 years ago

API wise it looks fantastic @fujita, I'm impressed at the great work you've done over the past several months. Thanks for being so receptive to feedback from your users, we all appreciate it.