Add support to manage a cache server with Gfsh

PedroAlvarado commented 6 years ago

It'd be great if a gfsh "stop server" command could also shutdown the hosting spring-boot process when using spring-data-geode. I understand that this is not something we'd want to have as a default behavior but it can be convenient.

Perusing through the code, I was unable to find a way to accomplish this. Is this possible with the current APIs?

jxblum commented 6 years ago

Perusing through the code, I was unable to find a way to accomplish this. Is this possible with the current APIs?

The short answer is, it is not (currently/easily) possible.

The long answer is rather complicated and involves the way in which Apache Geode's Shell tool (Gfsh) launches and manages servers in the cluster. Gfsh is not "aware" of other external services running or whether the Geode Server itself is part of an application. This actually matters.

Under-the-hood, Gfsh's stop server command is based on the o.a.g.distributed.ServerLauncher class. Internally, Apache Geode's JMX-based Management infrastructure creates and registers MBeans with the JRE's PlatformMBeanServer along with creating an internal, hidden system Region in order to federate the management of servers across the cluster from the Manager. The MBeans that start and stop (or sometimes pause/resume) Geode processes and/or services use other (sometimes, "internal") classes to ascertain the state of these servers/services. With the stop server command, it relies on state encapsulated from aninstance of the o.a.g.distributed.ServerLauncher class, which only Gfsh uses when launching a server, specifically when using the start server command.

Since a Spring Boot application may possibly "embed" an Apache Geode (Cache + (optionally_ CacheServer) in the application JVM process, the Geode Server, or rather Spring Boot application in this case, would not be launched using an instance of the o.a.g.distributed.ServerLauncher class. You would probably be running $java -jar /path/to/your/spring/boot/application.jar. Besides, o.a.g.distributed.ServerLauncher is proprietary and specific to the way Gfsh and Geode's Management infrastructure functions, and as such, has no utility inside SDG.

You should also keep in mind that the Spring container is the one thing bootstrapping all the services that your application requires to run.

Spring Boot starts by configuring and initializing the Spring container. In turn, Spring Boot may bootstrap an "embedded" Tomcat/Jetty HTTP server, particularly if your application is a Web application (i.e., you declared the spring-boot-starter-web dependency on your application's classpath). Additionally, you might be running an "embedded" message server or be interfacing with an "embedded" database (e.g. H2 or HSQL, etc). All of these services must be properly stopped when your Spring Boot application is shutdown.

Having Gfsh stop the Spring Boot application just because it embeds an Apache Geode Server is a bit short-sighted in the scenario above, which is not at all uncommon. Gfsh is not going to be "aware" of the Spring container in your application, much less these other services. Though Spring will register a JVM shutdown hook to properly cleanup the context, it is all dependent on what bootstrap's what and what it expects to be shutdown on stop.

By way of example, you probably would not use Apache Tomcat's Manager App to stop your Spring Boot application just because it embeds an Apache Tomcat server in the case that you application is a Web application. An embedded Tomcat instance is not exactly the same thing as a standalone Tomcat server.

Furthermore, it is much more common that your Spring Boot application would be an Apache Geode ClientCache (a.k.a. cache client) application rather than an actual "peer" Cache application that will (possibly) become a member of a cluster of servers. While this is possible, it is generally not recommended, especially for production deployments. The Client/Server Topology is your friend for production.

Though, I would argue a cluster of Spring Boot, Apache Geode Servers is useful for testing and debugging locally, inside your IDE. But then, as a developer, I really don't want to bounce between my IDE and Gfsh just to interface with Apache Geode, hence SDG's new Annotation-based configuration. The new Annotation-based configuration model was designed and influenced by the powerful concepts popularized by Spring Boot. When these Annotations are further combined with the foundation of Spring Boot's auto-configuration, then the full power of "convention over configuration" will finally be realized. Also see this.

Still, despite the lack of support for stop server, it is possible to run describe member (and most other Gfsh commands, minus the lifecycle commands (i.e. start, status and stop)) on a Spring Boot, Apache Geode-based Server application. I have examples along with documentation for this here and here.

Now, on the flip-side, it is possible to start an Apache Geode Server with Gfsh bootstrapped with Spring by using "start server --name=SpringConfiguredApacheGeodeServer --spring-xml-location=/classpath/to/spring/application/context.xml --classpath=/absolute/file/system/path/to/spring-data-geode-2.0.3.RELEASE.jar:...".

In this case, the JVM process is an Apache Geode server first, which then bootstraps a Spring container that then can configure and initialize the Geode server (i.e. Cache and (optionally) CacheServer instances), all started from Gfsh. You only need to ensure that SDG is on your server's classpath when starting the server. When the Geode Server is bootstrapped and subsequently configured/initialized with Spring in this manner, it is then possible to control the server status (i.e. status server) and stop the server (using stop server) in Gfsh. However, it will no longer be a Spring Boot application anymore.

See the corresponding SDG docs for more details.

Anyway, sorry for the long winded comment, but I hope this sheds some light on what is current available/possible as well as my direction forward.

Cheers!

PedroAlvarado commented 6 years ago

Thanks John. I understand. That said, it is important that we highlight in the SDG documentation, in a way akin to @EnableLocator and @EnableManager, that currently the usage of the @CacheServerApplication and @PeerCacheApplication annotations is not a recommended way to add members to a server cluster in production environments. Similarly, the documentation should stress that SDG, under a Client/Server topology and in a production environment, should be used as a client only. For a novice user, this may be something that can be easily missed. Let me know if this make sense to you.

Moreover, I'll take the liberty to add here some of the considerations to bear in mind when adding a member to a server cluster in a production environment using @CacheServerApplication and @PeerCacheApplication.

There are two lifecycles at play to be managed: the spring-boot app and the embedded geode server
Per above, no complete GFSH support for server lifecycle management is available(start, stop, status, etc).
Per above, member instrumentation requires a combination of GFSH and custom code(e.g stop/start server scripts, spring-boot health indicators, etc)
Member default configuration can be different than documented Geode configuration defaults. (e.g. disable-auto-reconnect is set to false in vanilla Geode however in SDG its practically true)
No support for the Cluster Configuration Service as it depends on GFSH based commands.

jxblum commented 6 years ago

Hi Pedro,

Yes, all very valid points. I can definitely see how @EnableLocator and @EnableManager combined with starting Annotations, either @PeerCacheApplication or @CacheServerApplication, would lead users to think that using these configure a Spring Boot, Apache Geode Server is the "recommended" way of starting servers. Truthfully, it really has everything to do with "preference", your development process and your dev-ops culture.

I have, fairly recently in fact, taken a very client centric approach for SDG. This has to do with, in no small part, Pivotal Cloud Foundry (PCF). When using PCF, the application developer really does not need to worry about the cluster of servers, other than managing and monitoring. Provisioning is handled by the PCF PaaS environment and platform.

Having said that, I would say it also depends.

In fact, some users/customers do actually start and manage their clusters with Spring Boot, or more generally, with a Spring-first approach, while other users/customers have strict requirements from their administrative/operations team, who impose restrictions on developers for how they are allowed start/stop and manage servers in their Geode cluster, that they must use the tools provided with Apache Geode. And for Apache Geode, or more specifically, Pivotal GemFire, that usually means developers and ops teams must use Gfsh and Pulse, despite both tools not being entirely complete or all inclusive...

Unfortunately, even though the Spring team makes a best effort to provide interoperability and a seamless experience, it is not always reciprocal. That is, the Apache community does not always do things that are in the best interests of the "extended" community, e.g. making things work in other contexts, like a Spring context. In many cases, I have really gone to great lengths in order to make Apache Geode play nice in a Spring context. But sometimes, I just need certain amount of support from Apache Geode to do certain things and make life easier for me.

Anyway, it is no excuse for clearer documentation along with examples, and by no means, are either complete. I definitely appreciate your feedback.

Now, to quickly address some of your other points. Regarding...

There are two lifecycles at play to be managed: the spring-boot app and the embedded geode server

So, when the Spring container is the one doing the "bootstrapping" (i.e. you are using the embedded approach, and to be clear, that is a perfectly valid approach as well, just not something we "generally" recommend), then you can rest assured that Spring (Data Geode) will do the right thing to manage the Apache Geode lifecycle correctly in a Spring context. That is SDG's guarantee and anything less is a bug.

As you know, Spring provides many lifecycle hooks (i.e. callbacks) to ensure that external services, like Apache Geode, are managed appropriately in a Spring context.

SDG is no exception and it leverages the core Spring Framework and container lifecycle mechanisms to ensure Apache Geode is appropriately configured/initialized and shutdown at appropriate times throughout and during the entire lifecycle of the Spring container.

For instance, SDG's o.s.d.g.CacheFactoryBean implements the core Spring Framework's FactoryBean interface along with InitializingBean and DisposableBean, which ties the Apache Geode's Cache lifecycle to the Spring container.

* Per above, no complete GFSH support for server lifecycle management is available(start, stop, status, etc).

* Per above, member instrumentation requires a combination of GFSH and custom code(e.g stop/start server scripts, spring-boot health indicators, etc)

Currently, that is correct.

Although, this requires a certain amount of support from the Apache Geode team. I am currently working with them to get support without resorting to ugly, hard-to-maintain, brittle framework code and the use of "internal" Geode classes (for example) that are subject to change.

Additionally, I have it in my SDG roadmap for 2018, plans to add support in Spring Boot's Actuator to at least provide monitoring endpoints for Apache Geode/Pivotal GemFire; for more details and to watch my progress, see SGF-671.

_ Member default configuration can be different than documented Geode configuration defaults. (e.g. disable-auto-reconnect is set to false in vanilla Geode however in SDG its practically true)

Not necessarily and not in all cases.

For the most part, the new SDG Annotation-based configuration attempts to use or set the Apache Geode configuration defaults OOTB.

There are (or should be only) 2 exceptions to this rule; 1 is "auto-reconnect". The other is "Cluster Configuration". I will address the later (CC) below.

With "auto-reconnect" it really does not make sense to enable for Spring Boot, Apache Geode Server (i.e. embedded "peer" Cache) "applications" since it has everything to do with the way Apache Geode's auto-reconnect behavior is implemented and works, currently.

If your Spring Boot application is just used for the purposes of bootstrapping (configuring/initializing) an Apache Geode Server with no "application" components like @Service or DAOs, etc, then by all means, you can enable this feature. See here.

However, if your Spring Boot, Apache Geode peer Cache server "application" is a full-fledged application (e.g. Web application), then you should definitely not use "auto-reconnect" functionality.

When a member gets disconnected from the cluster (or "Distributed System" (DS)), either because the member is sick (e.g. almost out-of-memory, overloaded, has a hardware failure, etc) or gets separated due to a network failure (i.e. network partition), etc, then the member is kicked out of the DS. The member, by Geode defaults, immediately enters a auto-reconnect state.

If the member successfully reconnects, then all the old Geode object references (i.e. Cache, Regions, etc) immediately become "stale". As you can imagine, this is problematic for Spring (Boot) applications which may have auto-wired (or injected) references to these Geode objects in the developer's application components (i.e. @Service, @Repository objects, etc), either directly or indirectly if the developer is taking advantage of some of SDG's abstractions... o.s.d.g.GemfireTemplate or perhaps the SD Repository abstraction, etc.

Unfortunately, it turns out the Geode team did not need to invalidate all the old Geode object references, like a Region. They simply could have just replaced the o.a.g.distributed.DistributedSystem reference held onto and managed by the single "peer" Cache instance when the "peer" Cache instance is created (then this, this and this) and when the member reconnects. The DS is responsible for all member relations, and Geode enters a "fence-and-merge" process to re-sync the data in the cache when the member reconnects anyhow. Rarely, if ever, should the application or even a framework like SDG need a reference to the DS. It especially does not need to hold onto a reference.

* No support for the Cluster Configuration Service as it depends on GFSH based commands.

Well, this is not entirely accurate or true either.

I agree that there needs to be better documentation and examples in this regard, but it is possible to use Apache Geode's Cluster Configuration Service with Spring. Meaning, a Spring Boot configured and bootstrapped Apache Geode Server can pick up Cluster Configuration from the Manager on start and a Spring Boot/Spring Data Geode application will effectively apply the cluster configuration, even on restarts.

The problem is, in the past, Apache Geode's Cluster Configuration would not recognize any Spring config (whether expressed in XML or JavaConfig) and record it in the cluster configuration, unlike when you are issuing commands via Gfsh.

However, I would argue that this is a problem even when you are using Apache Geode's own public, Java API or the the cache.xml format without Spring, too! Only when you affect schema changes via Gfsh is a user's actions actually recorded. Of course, you can "import" cache.xml if you have a bunch of legacy cache.xml, unlike Spring config, but still.

IMO, this spells out one of the major deficiencies with Apache Geode... no consistency across their various configuration options.

While this still remains to be largely true for Spring configured Apache Geode Servers wrt Spring XML or JavaConfig, I have taken steps to rectify some of this from a Spring (Boot/SDG) configured client cache application. See here.

NOTE: yes, I have overloaded the meaning of Cluster Configuration from a Spring ClientCache context.

Finally, I will add that not all is lost when using Spring on the server with Cluster Configuration. While, Apache Geode will not recognize and record Spring config (XML, JavaConfig or otherwise), as mentioned above, Spring does accept cluster configuration from the Manager and applies it. Any Spring-specific configuration "augments" what either comes in from a local cache.mxl file or Cluster Config.

This behavior is due in part because Spring (Data Geode) allows you to configure your Geode instances with cache.xml (e.g. <gfe:[client-]cache cache-xml-location="..">) in addition to Spring config (see here).

Also, you can do interesting combinations (along with this) of Spring config with Apache Geode's config (e.g. cache.xml) which equally applies to configuration coming from cluster config, since, effectively, SDG treats cluster config and cache.xml the same way.

I have even done some POC work for Pivotal GemFire customers demonstrating some of these techniques int his Repo. Unfortunately, it is not well-documented. A WIP, :(

Anyway, hope this helps!

jxblum commented 6 years ago

Closing as "won't-fix". This Issue will morph into providing more complete documentation and examples instead.

spring-projects / spring-data-geode

Add support to manage a cache server with Gfsh #3