Closed AlexandreCassagne closed 1 year ago
There is a related issue https://github.com/jhipster/generator-jhipster/issues/9521 but not enough info was provided there.
Here are steps to reproduce locally through Docker with just the Registry and a Monolith (no database or client). I wrote these up a while ago but then suddenly couldn't reproduce it. It seems to happen more often with a new Registry container
{ "generator-jhipster": { "promptValues": { "packageName": "com.mycompany.myapp" }, "jhipsterVersion": "6.5.1", "applicationType": "monolith", "baseName": "jhipster", "packageName": "com.mycompany.myapp", "packageFolder": "com/mycompany/myapp", "serverPort": "8080", "authenticationType": "jwt", "cacheProvider": "hazelcast", "enableHibernateCache": false, "websocket": false, "databaseType": "no", "devDatabaseType": "no", "prodDatabaseType": "no", "searchEngine": false, "messageBroker": false, "serviceDiscoveryType": "eureka", "buildTool": "maven", "enableSwaggerCodegen": false, "jwtSecretKey": "Y2I3MzA3ZjIyM2MzOTI1YzZjNGUwODYxY2U5NjkzNGQ3NDc5YTUyNjBiYzlmZTY3YmRmMDVkMWVhNjg1NjM3ZGMzZWY1MWJhYmU2YzEzZjg3ZTgyZmI3NDYzYTEwMWRiZGQyN2JiYWUxMTE3MThhMDUxODM5YmU2NTU1MDY1ZGE=", "embeddableLaunchScript": false, "useSass": true, "clientPackageManager": "npm", "skipClient": true, "creationTimestamp": 1576104597610, "testFrameworks": [], "jhiPrefix": "jhi", "entitySuffix": "", "dtoSuffix": "DTO", "otherModules": [], "enableTranslation": false, "blueprints": [] } }
.yo-rc.json
ports
key and value from src/main/docker/app.yml
./mvnw package -Pprod verify jib:dockerBuild -DskipTests
docker-compose -f src/main/docker/app.yml up -d --scale jhipster-app=2
docker logs -f docker_jhipster-app_1
Members {size:1, ver:1} [
Member [172.18.0.4]:5701 - 07a5e6c6-0c9c-48cb-a8a2-bac717295c56 this
]
@ruddell Thanks for the minimal example, when you say it happens more often with a new Registry container, what do you mean?
If I recreate the entire stack this issue happens more consistently. When restarting just the mono containers, sometimes they find each other on startup.
Interesting. I haven’t noticed that, it’s consistent even without touching the registries.
On 12 Dec 2019, at 14:40, Jon Ruddell notifications@github.com wrote:
If I recreate the entire stack this issue happens more consistently. When restarting just the mono containers, sometimes they find each other on startup.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
I experienced similar issue with Hazelcast and Eureka service discovery. I got it working by replacing JHipster's code using Eureka to join Hazelcast cluster by code using hazelcast-eureka-one.
So, using hazelcast-kubernetes might help too in your case.
I'm adding a bounty on this, as it seems to be a real bug
Yes this important, and it's logical we have this kind of issue. I'm currently working with the Hazelcast team (for my "real" work!!) - for the record, I'm having a look at https://github.com/hazelcast/hazelcast-azure so maybe I can find some idea.
Not sure we can easily find a generic solution.
In the case of Eureka, I am using metadata for host and port and I think that the main difference is in the way cluster join is hooked to Spring events. I followed the Hazelcast sample that injects an EurekaClient while JHipster's code uses a generic DiscoveryClient, it's supposed to make no difference but maybe the issue comes from Spring Cloud bootstrapping. My project is still in SB 1.5 so maybe it does no longer apply.
I tend to think that JHipster configuration code should follow the code samples provided by Hazelcast even though it requires adding specific dependency on hazelcast-eureka, hazelcast-azure or hazelcast-kubernetes. This configuration could easily be isolated in specific classes.
For me this looks like a potentially huge bug that thankfully happens rarely (I have never seen it). Indeed there is no failover for cluster discovery if the initial registration fails at startup time. We should definitely change our code to use the Hazelcast recommended way even if it uses technology specific dependencies. Those can be conditionally enabled on the application by using the combination of maven/gradle + spring profile like we did for zipkin support.
Sadly, it is easy to reproduce for me - in fact, it happens anytime I launch all instances simultaneously (e.g. kill all instances, wait, and start again).
Perhaps my configuration is slightly different though.
On 31 Dec 2019, at 16:56, Pierre Besson notifications@github.com wrote:
For me this looks like a potentially huge bug that thankfully happens rarely (I have never seen it). Indeed there is no failover for cluster discovery if the initial registration fails at startup time. We should definitely change our code to use the Hazelcast recommended way even if it uses technology specific dependencies. Those can be conditionally enabled on the application by using the combination of maven/gradle + spring profile like we did for zipkin support.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
I have the same problem with kubernetes. Even adding the dependency kubernetes-hazelcast
and configuring it. My current 'solution', I launch a replicas and once launched, I scale...
I plan to test with redis ..
This issue is stale because it has been open 30 days with no activity. Our core developers tend to be more verbose on denying. If there is no negative comment, possibly this feature will be accepted. We are accepting PRs :smiley:. Comment or this will be closed in 7 days
I believe the issue is related to the readiness probes. Let's revisit this when the Spring Boot 2.3. release of JHipster is published.
This issue is stale because it has been open 30 days with no activity. Our core developers tend to be more verbose on denying. If there is no negative comment, possibly this feature will be accepted. We are accepting PRs :smiley:. Comment or this will be closed in 7 days
@AlexandreCassagne time to dig into this again ;-)
Haha, perhaps so. I will look into it, I think this should be a rather quick issue to resolve.
@AlexandreCassagne did you have news? Issue still persist?
Sorry @DanielFran, I dropped the ball on this one.
@jdubois did you get inspiration from working with Azure and the Hazelcast team?
@DanielFran I have some work going on but I want to close this once and for all ;-)
@jdubois is it ok to completely rewrite the cache configuration portion and build a kubernetes-only version? I want to migrate away from the Jhipster Registry, I think Eureka is no longer needed but that requires a lot of configuration changes (config maps? Kubernetes discovery API? cache configuration? etc)
@AlexandreCassagne for Azure + Hazelcast it's a very specific implementation that uses something called Azure Ping, so it would only work on Azure VMs (so this is quite a limited use case). Concerning Eureka, I'm afraid it's the only way to get the nodes know about each other if you follow the usual JHipster/Spring architecture. You can probably rewrite this in a k8s-specific way, but then it would only work on k8s, and I'm afraid this would be another limitation as many people don't use k8s.
@jdubois @gmarziou @PierreBesson @AlexandreCassagne This issue is stale for long time. Closing for now. In case of any issue please open a new issue.
Overview of the issue
Sometimes, when starting instances simultaneously, services fail to discover each other and so several Hazelcast clusters are created and never merge. One piece of evidence to prove this is that all the different instances Hazelcast log include:
All of the other instances show a similar cluster size and this never gets updated. Another hint that this is happening at discovery is that if I restart 3 out of 4 instances, after about 1min they all suddenly join the older instance (cluster size goes to 4).
I have done a bit of investigation, and the critical code path seems to be in CacheConfiguration, where all the discovery happens at startup:
this code sets the instance's own Hazelcast network configuration, then iterates over the discovery client's other visible instances. Unfortunately, as this is quite early in the lifecycle of the microservice, other services may not have registered yet, so the clusters never join.
Motivation for or Use Case
For those using a second level cache, instances that do not join mean inconsistencies, possibly even security critical ones. For instance, a user's role may be removed, but eviction of the user's authorities only happens on that 'cluster'.
Reproduce the error
For me (perhaps my configuration is specific) this happens anytime I launch many instances simultaneously. I am using Kubernetes, so starting a deployment with several replicas is simple.
If just one instance is visible when the other instances are started, the problem goes away (e.g. start one instance, wait for it to be properly started, then increase replica count).
Related issues
Suggest a Fix
If others agree that this is an issue, we should decide a way forward. There are several considerations:
Once a decision is made, I'll be happy to help implement any changes.
JHipster Version(s)
v6.5.1
JHipster configuration
.yo-rc.json file