jhipster / generator-jhipster

JHipster is a development platform to quickly generate, develop, & deploy modern web applications & microservice architectures.
https://www.jhipster.tech
Apache License 2.0
21.51k stars 4.02k forks source link

Hazelcast Discovery on Startup Not Adequate #10916

Closed AlexandreCassagne closed 1 year ago

AlexandreCassagne commented 4 years ago
Overview of the issue

Sometimes, when starting instances simultaneously, services fail to discover each other and so several Hazelcast clusters are created and never merge. One piece of evidence to prove this is that all the different instances Hazelcast log include:

Members {size:1, ver:1} [
    Member [10.68.6.109]:5701 - 693df0fa-63a9-4015-97ec-48feea1cf483 this
]

All of the other instances show a similar cluster size and this never gets updated. Another hint that this is happening at discovery is that if I restart 3 out of 4 instances, after about 1min they all suddenly join the older instance (cluster size goes to 4).

I have done a bit of investigation, and the critical code path seems to be in CacheConfiguration, where all the discovery happens at startup:

config.getNetworkConfig().setPort(5701);
config.getNetworkConfig().getJoin().getTcpIpConfig().setEnabled(true);
for (ServiceInstance instance : discoveryClient.getInstances(serviceId)) {
     String clusterMember = instance.getHost() + ":5701";
     log.debug("Adding Hazelcast (prod) cluster member {}", clusterMember);
     config.getNetworkConfig().getJoin().getTcpIpConfig().addMember(clusterMember);
}

this code sets the instance's own Hazelcast network configuration, then iterates over the discovery client's other visible instances. Unfortunately, as this is quite early in the lifecycle of the microservice, other services may not have registered yet, so the clusters never join.

Motivation for or Use Case

For those using a second level cache, instances that do not join mean inconsistencies, possibly even security critical ones. For instance, a user's role may be removed, but eviction of the user's authorities only happens on that 'cluster'.

Reproduce the error

For me (perhaps my configuration is specific) this happens anytime I launch many instances simultaneously. I am using Kubernetes, so starting a deployment with several replicas is simple.

If just one instance is visible when the other instances are started, the problem goes away (e.g. start one instance, wait for it to be properly started, then increase replica count).

Related issues
Suggest a Fix

If others agree that this is an issue, we should decide a way forward. There are several considerations:

Once a decision is made, I'll be happy to help implement any changes.

JHipster Version(s)

v6.5.1

JHipster configuration
.yo-rc.json file
{
  "generator-jhipster": {
    "promptValues": {
      "packageName": "redacted"
    },
    "jhipsterVersion": "6.5.1",
    "applicationType": "microservice",
    "baseName": "simplifier",
    "packageName": "redacted",
    "packageFolder": "redacted",
    "serverPort": "9050",
    "authenticationType": "uaa",
    "uaaBaseName": "uaa",
    "cacheProvider": "hazelcast",
    "enableHibernateCache": true,
    "websocket": false,
    "databaseType": "no",
    "devDatabaseType": "no",
    "prodDatabaseType": "no",
    "searchEngine": false,
    "messageBroker": "kafka",
    "serviceDiscoveryType": "eureka",
    "buildTool": "gradle",
    "enableSwaggerCodegen": false,
    "testFrameworks": [],
    "jhiPrefix": "jhi",
    "entitySuffix": "",
    "dtoSuffix": "DTO",
    "otherModules": [],
    "enableTranslation": false,
    "clientPackageManager": "npm",
    "skipClient": true,
    "skipUserManagement": true,
    "blueprints": [],
    "embeddableLaunchScript": false,
    "directoryPath": "../",
    "dockerRepositoryName": "",
    "dockerPushCommand": "docker push",
    "kubernetesNamespace": "test",
    "kubernetesServiceType": "LoadBalancer",
    "ingressDomain": "",
    "monitoring": "no",
    "istio": false
  }
}
ruddell commented 4 years ago

There is a related issue https://github.com/jhipster/generator-jhipster/issues/9521 but not enough info was provided there.

Here are steps to reproduce locally through Docker with just the Registry and a Monolith (no database or client). I wrote these up a while ago but then suddenly couldn't reproduce it. It seems to happen more often with a new Registry container

.yo-rc.json file
{
  "generator-jhipster": {
    "promptValues": {
      "packageName": "com.mycompany.myapp"
    },
    "jhipsterVersion": "6.5.1",
    "applicationType": "monolith",
    "baseName": "jhipster",
    "packageName": "com.mycompany.myapp",
    "packageFolder": "com/mycompany/myapp",
    "serverPort": "8080",
    "authenticationType": "jwt",
    "cacheProvider": "hazelcast",
    "enableHibernateCache": false,
    "websocket": false,
    "databaseType": "no",
    "devDatabaseType": "no",
    "prodDatabaseType": "no",
    "searchEngine": false,
    "messageBroker": false,
    "serviceDiscoveryType": "eureka",
    "buildTool": "maven",
    "enableSwaggerCodegen": false,
    "jwtSecretKey": "Y2I3MzA3ZjIyM2MzOTI1YzZjNGUwODYxY2U5NjkzNGQ3NDc5YTUyNjBiYzlmZTY3YmRmMDVkMWVhNjg1NjM3ZGMzZWY1MWJhYmU2YzEzZjg3ZTgyZmI3NDYzYTEwMWRiZGQyN2JiYWUxMTE3MThhMDUxODM5YmU2NTU1MDY1ZGE=",
    "embeddableLaunchScript": false,
    "useSass": true,
    "clientPackageManager": "npm",
    "skipClient": true,
    "creationTimestamp": 1576104597610,
    "testFrameworks": [],
    "jhiPrefix": "jhi",
    "entitySuffix": "",
    "dtoSuffix": "DTO",
    "otherModules": [],
    "enableTranslation": false,
    "blueprints": []
  }
}

AlexandreCassagne commented 4 years ago

@ruddell Thanks for the minimal example, when you say it happens more often with a new Registry container, what do you mean?

ruddell commented 4 years ago

If I recreate the entire stack this issue happens more consistently. When restarting just the mono containers, sometimes they find each other on startup.

AlexandreCassagne commented 4 years ago

Interesting. I haven’t noticed that, it’s consistent even without touching the registries.

On 12 Dec 2019, at 14:40, Jon Ruddell notifications@github.com wrote:

 If I recreate the entire stack this issue happens more consistently. When restarting just the mono containers, sometimes they find each other on startup.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

gmarziou commented 4 years ago

I experienced similar issue with Hazelcast and Eureka service discovery. I got it working by replacing JHipster's code using Eureka to join Hazelcast cluster by code using hazelcast-eureka-one.

So, using hazelcast-kubernetes might help too in your case.

pascalgrimaud commented 4 years ago

I'm adding a bounty on this, as it seems to be a real bug

jdubois commented 4 years ago

Yes this important, and it's logical we have this kind of issue. I'm currently working with the Hazelcast team (for my "real" work!!) - for the record, I'm having a look at https://github.com/hazelcast/hazelcast-azure so maybe I can find some idea.

Not sure we can easily find a generic solution.

gmarziou commented 4 years ago

In the case of Eureka, I am using metadata for host and port and I think that the main difference is in the way cluster join is hooked to Spring events. I followed the Hazelcast sample that injects an EurekaClient while JHipster's code uses a generic DiscoveryClient, it's supposed to make no difference but maybe the issue comes from Spring Cloud bootstrapping. My project is still in SB 1.5 so maybe it does no longer apply.

I tend to think that JHipster configuration code should follow the code samples provided by Hazelcast even though it requires adding specific dependency on hazelcast-eureka, hazelcast-azure or hazelcast-kubernetes. This configuration could easily be isolated in specific classes.

PierreBesson commented 4 years ago

For me this looks like a potentially huge bug that thankfully happens rarely (I have never seen it). Indeed there is no failover for cluster discovery if the initial registration fails at startup time. We should definitely change our code to use the Hazelcast recommended way even if it uses technology specific dependencies. Those can be conditionally enabled on the application by using the combination of maven/gradle + spring profile like we did for zipkin support.

AlexandreCassagne commented 4 years ago

Sadly, it is easy to reproduce for me - in fact, it happens anytime I launch all instances simultaneously (e.g. kill all instances, wait, and start again).

Perhaps my configuration is slightly different though.

On 31 Dec 2019, at 16:56, Pierre Besson notifications@github.com wrote:

 For me this looks like a potentially huge bug that thankfully happens rarely (I have never seen it). Indeed there is no failover for cluster discovery if the initial registration fails at startup time. We should definitely change our code to use the Hazelcast recommended way even if it uses technology specific dependencies. Those can be conditionally enabled on the application by using the combination of maven/gradle + spring profile like we did for zipkin support.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

antarus commented 4 years ago

I have the same problem with kubernetes. Even adding the dependency kubernetes-hazelcast and configuring it. My current 'solution', I launch a replicas and once launched, I scale...

I plan to test with redis ..

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Our core developers tend to be more verbose on denying. If there is no negative comment, possibly this feature will be accepted. We are accepting PRs :smiley:. Comment or this will be closed in 7 days

AlexandreCassagne commented 4 years ago

I believe the issue is related to the readiness probes. Let's revisit this when the Spring Boot 2.3. release of JHipster is published.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Our core developers tend to be more verbose on denying. If there is no negative comment, possibly this feature will be accepted. We are accepting PRs :smiley:. Comment or this will be closed in 7 days

Tcharl commented 2 years ago

@AlexandreCassagne time to dig into this again ;-)

AlexandreCassagne commented 2 years ago

Haha, perhaps so. I will look into it, I think this should be a rather quick issue to resolve.

DanielFran commented 2 years ago

@AlexandreCassagne did you have news? Issue still persist?

AlexandreCassagne commented 2 years ago

Sorry @DanielFran, I dropped the ball on this one.

@jdubois did you get inspiration from working with Azure and the Hazelcast team?

AlexandreCassagne commented 2 years ago

@DanielFran I have some work going on but I want to close this once and for all ;-)

@jdubois is it ok to completely rewrite the cache configuration portion and build a kubernetes-only version? I want to migrate away from the Jhipster Registry, I think Eureka is no longer needed but that requires a lot of configuration changes (config maps? Kubernetes discovery API? cache configuration? etc)

jdubois commented 2 years ago

@AlexandreCassagne for Azure + Hazelcast it's a very specific implementation that uses something called Azure Ping, so it would only work on Azure VMs (so this is quite a limited use case). Concerning Eureka, I'm afraid it's the only way to get the nodes know about each other if you follow the usual JHipster/Spring architecture. You can probably rewrite this in a k8s-specific way, but then it would only work on k8s, and I'm afraid this would be another limitation as many people don't use k8s.

DanielFran commented 1 year ago

@jdubois @gmarziou @PierreBesson @AlexandreCassagne This issue is stale for long time. Closing for now. In case of any issue please open a new issue.