Closed mcmar closed 2 years ago
+1, encountering this problem as well
Hey @mcmar ,
thanks for bringing this up. This is an area where we are lacking documentation. The prisma server can be started either with the management API enabled or not. If the management API is enabled it will try to acquire the agent lock on startup. This is to ensure that there is only one Prisma server at a time writing into the management tables. So your second server also has the management API hence you are seeeing this log message.
The management API can be simply enabled in the Prisma server config, e.g.:
port: 60000
managementApiSecret: my-secret
rabbitUri: amqp://my-rabbitmq-server
enableManagementApi: true|false
databases:
default:
connector: mysql
...
In Prisma Cloud we are running Prisma like this for horizontal scalability:
/management
must be routed to the primary server.rabbitUri
above). If you don't want to run a RabbitMQ server on your own, i recommend CloudAMQP as a hoster.Does that help?
in addition to this you will also need a RabbitMQ server, which we use for PubSub
Uh, is that a requirement or is it only necessary in case you use subscriptions? @mavilein Could you clarify that please?
@emmenko : Right now it is required. We need for RabbitMQ for those reasons:
I guess you are fine with point 1. Point 2 is a tradeoff you need to decide for yourself in your usecase. point 3 is currently a blocker.
If we can find a solution for point 3 we could repackage the Prisma server without the RabbitMQ dependency. We could enable this through a separate Docker image or configuration flag.
I see. So if I want multiple replicas I also need to have a rabbitmq cluster on the side.
Do you plan to make the pubsub system configurable or is rabbitmq the only option? For example, Iβm running my services on GCP and it would be easier to use google pubsub.
Thanks anyway for the explanation! π
@mavilein Is Prisma using AMQP 0.9.1 or 1.0? 1.0 will work with Apache ActiveMQ and Amazon MQ, which would make my life much easier. 0.9.1 would require me to spin up my own RabbitMQ service with its own ELB.
@emmenko : RabbitMQ is currently the only option, but we have encapsulated our pubsub code into a neat interface. We could provide additional implementations for e.g. google pubsub. I just added a Feature request for this.
@mcmar : We are using the RabbitMQ Java client, so i think this 0.9.1 then.
@mcmar @emmenko : Would you be happier if we would support Apache Kafka? We are considering to add it for another feature anyway.
Thanks. For now I'm trying using the stable/rabbitmq
helm chart. I think it should be fine.
In our specific case, we run our services on K8s on Google Cloud, so for us an integration with Google PubSub would be perfect so that we don't have to manage that on our own π
@mavilein I'm trying to deploy prisma with 1 primary and 2 secondary. The primary has the managementApi enabled, the secondaries do not.
However, after starting, the primary and one of the secondaries keep crashing with the error
Fatal error during deployment worker initialization: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://single-server/user/$b#-844349262]] after [300000 ms]. Sender[null] sent message of type "com.prisma.deploy.migration.migrator.DeploymentProtocol$Initialize$".
I noticed that all 3 prisma containers are trying to "obtain the agent lock". From what you wrote before I tought that only the primary is suppose the get the lock, or should all of them do it? In that case, any idea why am I still getting errors? π€
I'm using prisma:1.13.4
.
Now the primary and one of the secondary are running but the 2nd secondary keeps crashing (no errors in the logs, only π)
Obtaining exclusive agent lock...
Initializing workers...
Successfully started 1 workers.
Server running on :4466
Version is up to date.
Am I doing something wrong? π€
Tried also with 1 primary and 1 secondary. After a couple of min, the secondary crashed with the same error.
I have a feeling that the management API is still enabled in both. I checked and I'm passing enableManagementApi: true
to the primary and enableManagementApi: false
to the secondary.
In case it helps, here are the logs for the pods (for the timings)
$ kubectl get pods -w | grep prisma
prisma-primary-c6f64d69d-ckkbn 2/2 Running 0 3m
prisma-secondary-75b857b766-xx8jz 2/2 Running 0 3m
prisma-secondary-75b857b766-xx8jz 1/2 Error 0 6m
prisma-secondary-75b857b766-xx8jz 1/2 Running 1 6m
prisma-secondary-75b857b766-xx8jz 2/2 Running 1 8m
And here the logs for the primary
Obtaining exclusive agent lock...
Obtaining exclusive agent lock... Successful.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
Deployment worker initialization complete.
Initializing workers...
Successfully started 1 workers.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
Server running on :4466
Version is up to date.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
[Metrics] No Prisma Cloud secret is set. Metrics collection is disabled.
and here for the secondary
Obtaining exclusive agent lock...
Initializing workers...
Successfully started 1 workers.
Server running on :4466
Version is up to date.
Fatal error during deployment worker initialization: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://single-server/user/$b#-1603503801]] after [300000 ms]. Sender[null] sent message of type "com.prisma.deploy.migration.migrator.DeploymentProtocol$Initialize$".
I noticed that when I try to access on both containers the http://localhost:4466/management
I get the graphql playground. Is this supposed to be working even if the enableManagementApi
is set to false
?
@emmenko : Oh my bad. I forgot to say that you need to use prisma-prod
image. Only this one contains the necessary ifs. We should really improve this experience. Thx for keeping to dig π
I forgot to say that you need to use
prisma-prod
image
Ooooh, thanks! π
I'll try that right away.
Btw, I'm happy to contribute to the documentation feedback with the experience I had so far. Let me know in case you need that π
Works! π
$ kubectl get pods -w | grep prisma
prisma-rabbitmq-0 1/1 Running 0 4h
prisma-rabbitmq-1 1/1 Running 0 4h
prisma-rabbitmq-2 1/1 Running 0 4h
prisma-primary-56d5699664-sn2sj 2/2 Running 0 35m
prisma-secondary-cbb5c87b8-c8qdn 2/2 Running 0 27m
prisma-secondary-cbb5c87b8-z8zgh 2/2 Running 0 23m
@emmenko : Nice. π I have opened an issue to unify our 2 Docker images as they do not seem necessary to me and just cause confusion. Happy to come back to you to get your feedback on the docs when we have a first version ready! π
@mavilein I'm attempting to implement the pattern you described in which you route /management
to prisma-primary and all other routes *
to prisma-primary and prisma-secondary.
I'm unable to implement that pattern in AWS using Application Load Balancers because ECS services can only register themselves in one target group.
I'm working off of the fargate.yml template in the prisma-templates
repo.
How does prisma host their own servers in AWS? Do you use ECS? Do you have 2 separate prisma services? Do you use Application Load Balancers as opposed to Classic Load Balancers? I can't find a way to implement it using ECS with ALBs.
Here's the issue in ECS: https://github.com/aws/amazon-ecs-agent/issues/1351#issuecomment-412377706
Hey @emmenko if you have a fairly generic kubernetes template for prisma with support for horizontal scaling, would you mind posting it or submitting a PR against https://github.com/prismagraphql/prisma-templates ? The current docs only show the single-server setup. I'm currently working on adding a second Cloudformation template for horizontal scaling. It'd be good to get something in there for Kubernetes too. I thought it'd be cool if we could all contribute back what we're learning and grow the OSS community around Prisma.
@mcmar hey, sure thing! Iβll start working on that in the next days π
Handy thread. Thank you for the information. I'd like to cast my vote for Google PubSub also as we currently have a cloud function consuming Prisma subscriptions over HTTP and passing them into GCP PubSub. This would reduce the latency.
@emmenko @mavilein hi, I'm in the same situation, I'm a bit confused, I'm stuck at the prisma-prod image step, when I try to run the container with this image, I get an error like I'm missing some SQL_INTERNAL_PASSWORD env var. I'm using cloud sql postgres and I can't find where I missed something and what this var is.
Where are you running the containers? Kubernetes?
@emmenko kubernetes engine yes
How do you pass the PRISMA_CONFIG
?
here is my config
- name: PRISMA_CONFIG
value: |
port: 4466
rabbitUri: amqp://...
managementApiEnabled: false
databases:
default:
connector: postgres
host: 127.0.0.1
port: 5432
user: "$(PG_USERNAME)"
password: "$(PG_PASSWORD)"
migrations: true
connectionLimit: 4
with the 'prismagraphql/prisma:1.14' image the container is ok but I has the exclusive agent lock problem and the container restarts every 5min with the 'prismagraphql/prisma-prod' image the container don't even start and fire the missing var SQL_INTERNAL_PASSWORD error
by the way thanx for your help
Hmm the config looks good. I'm using those images and for me things work
images:
prisma:
repository: prismagraphql/prisma-prod
tag: 1.14
pullPolicy: IfNotPresent
cloudsql:
repository: gcr.io/cloudsql-docker/gce-proxy
tag: 1.11
pullPolicy: IfNotPresent
Btw: I have one deployment for the "normal" prisma replicas which are connected to the LB, plus a deployment for the "management" prisma (1 replica only) that is not served by the LB (it's only used by port-forwarding).
Hopefully I manage to share my chart in the next weeks, in case it helps others ;)
π±thanx you just make me realized that I forgot the tag on the prisma-prod image !!! Yes I plan to to the same for management and replicas
Is there any plan to update documentation about Horizontal Scaling, rabbitUri, etc.?
Hi @emmenko, would you be able to post your kubernetes templates that you're using with prisma-prod and horizontal scaling? It looks like the current kubernetes instructions for prisma don't include rabbitMQ or horizontal scaling π’ https://www.prisma.io/docs/tutorials/cluster-deployment/kubernetes-aiqu8ahgha
@mavilein @divyenduz Can anyone on the Prisma team please provide a working template with horizontal scaling? Could be Cloudformation or K8s or anything. I've been trying for months to get this working, but I've got nothing. I documented the issue I'm having with ECS. I'd appreciate any help.
UPDATE:
Used @dpetrick's repo and it worked. I'll see what I can do to make another chart by combining @dpetrick's solution with the helm-prisma repo. Perhaps a prisma-prod
helm chart?
I really like that @dpetrick's version includes the ingress config to use both the primary and secondary servers via the same port. Using port-forwarding will work for local deploys, but means that you can't use prisma-cloud.
I whipped together a quick guide based on experiments I did myself with Kubernetes. Give it a try and tell me if it works for you. https://github.com/dpetrick/prisma-k8s-example.
Edit: I should mention that the example is derived from an actual working setup.
Hi @emmenko, I wonder how are you progressing with the GCP kubernetes templates?
Hey guys, I ended up writing an article on how we did it in my team because it's a bit difficult to come up with something generic that fits all use cases.
https://techblog.commercetools.com/prisma-horizontal-scaling-a-practical-guide-3a05833d4fc3
Have a read and hopefully it's somehow helpful to a lot of people π
Scaling Prisma horizontally right now seems to require quite a bit of DevOps knowledge. For DevOps novices such as myself, is there any hope that this will be made easier in Prisma sometime soonish? I notice that this issue is tagged with docs
and not with feature
π³
@jhalborg : As a first step we will improve the docs and then follow up with some changes to make it significantly easier to run Prisma in production. We will make RabbitMQ optional if you don't use subscriptions for example. Thanks for letting us know! π
@mavilein - Thanks! But what about simply scaling Prisma servers horizontally with no subscriptions - is that possible to do somewhat easily?
As far as I understand, if I i.e. deploy to Heroku and scale up more dynos, that won't work seeing as it needs a master/slave setup for propogating changes from the /management
endpoint, correct?
As far as I understand, if I i.e. deploy to Heroku and scale up more dynos
Well it's not really a master/slave (primary/secondaries).
You can deploy and scale up the servers that are configured without the /management
endpoint.
Then have separately a single server with the /management
API enabled.
Without the RabbitMQ dependency coming up, things are going to be a bit easier to set up and manage hopefully. Looking forward to that!
To be honest, I'm still very confused. Perhaps I'll just have to wait on new docs and see if that helps.
As it stands now, Prisma seems to be the weakest link in our setup. The API scales horizontally automagically, but that doesn't help much when it needs to hit a single Prisma server to access the DB.
@jhalborg For heroku, you would do 3 things:
1) Use a rabbitMQ heroku addon
2) Change the prisma
docker image to prisma-prod
3) Add the rabbitUri: amqp://...
and managementApiEnabled: false
props to your PRISMA_CONFIG
env var.
Thanks @mcmar - I might be slow, but I'm still unsure, I haven't worked with RMQ before. If I set those two variables in my config, where would the management API (primary server) then be hosted? And will it "just work" if I setup that addon and refer to it in the config?
I've searched, but can't find any guides on the topic except for the one @emmenko wrote - which I'm sure is pretty awesome, but once again requires learning Kubernetes and RabbitMQ
I'm not much familiar with Heroku to be able to help you out further. Maybe someone who does can jump in? Have you also try asking for help in the Slack channels?
@jhalborg I didn't provide instructions for the primary server. If you want to also host that in Heroku, then you would clone your Heroku environment and go through these steps:
1) Copy your CLOUDAMQP_URL
(or similar name) env var from above. DO NOT create a new rabbitMq plugin. They need to point to the same server.
2) Change the prisma docker image to prisma-prod (same as above)
3) Add the rabbitUri: amqp://...
and managementApiEnabled: true
props to your PRISMA_CONFIG env var.
4) Only spin up one server for primary.
You'll end up with 2 urls. One for primary and one for secondary servers.
Use the primary URL for deployments and prisma-cloud.
Use the secondary URL for your graphql-yoga
or apollo-server
server.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.
I try to reproduce the horizontal scale on localhost. But it seems no message send to RabbitMQ. Here is the docker-compose.yml file: https://github.com/xcv58/Prisma-Horizontal-Example/blob/master/docker-compose.yml
Could you please point out what's wrong with the configuration? Thanks!
Fixed in https://github.com/xcv58/Prisma-Horizontal-Example/commit/19de8480ab7aa30c2fb98d3d46b6614414e3abf0:
I should use enableManagementApi
instead of managementApiEnabled
Do people/@emmenko not still see issues when doing Prisma version updates? As most K8s / ECS setups will provision the new βmanagement enabledβ instance in parallel before turning off the old one? Wonβt that cause a lock and error?
Describe the bug When I try to scale Prisma horizontally by adding a second server, the first server logs:
then the second server logs just this and crashes:
The reason appears to be this line of code: https://github.com/prismagraphql/prisma/blob/d5c97fe8f1c1ee223ec1392ebdf16f2545b2f763/server/servers/deploy/src/main/scala/com/prisma/deploy/migration/migrator/DeploymentSchedulerActor.scala#L52
It seems that Prisma is explicitly ensuring that there's only ever 1 cluster/server (prisma terminology changes) that can run against a DB.
To Reproduce Steps to reproduce the behavior:
Prisma
serviceNumber of tasks
from 1 to 2Expected behavior 2 Prisma instances would run against 1 DB
Screenshots None
Versions (please complete the following information):
prisma
CLI:prisma/1.11.1 (darwin-x64) node-v8.11.1
1.11.0
(per Cloudformation template)Additional context Already reported in the slack channel. Was told to create a bug here. @divyenduz