terraform-community-providers / terraform-provider-railway

Terraform provider for railway.app
https://registry.terraform.io/providers/terraform-community-providers/railway/latest/docs
Mozilla Public License 2.0
22 stars 1 forks source link

Environment variables and build race condition #22

Closed wertlex closed 1 month ago

wertlex commented 6 months ago

Hey!

I'm currently trying to deploy rather big project in the Railway and having a trouble dealing with environment variables. The thing is that I need some of the env vars to be defined in build time (say, NPM_TOKEN as an example, in order to access private libraries on NPM, but there are really way more than that scattered over 20+ services). However, since env vars are managed as a separate resource (railway_variable) which is instantiated after full service provisioning, I'm having pretty often a state when service build is already started in Railway, but API call for creating env vars still not performed.

I am considering to spend some time to find out how to fix that (at least serviceCreate allows input.variables), but before writing any code I'd like to ask if there any ideas on that or somebody else was suffering from the same issue.

Funding

Fund with Polar

pksunkara commented 6 months ago

I do have some ideas on improving this, but let's first discuss the problem a bit more.

I'm having pretty often a state when service build is already started in Railway, but API call for creating env vars still not performed.

That's fine because the service will restart building every time an env var is set. Are you having any issues with it not starting?

wertlex commented 6 months ago

That's fine because the service will restart building every time an env var is set. Are you having any issues with it not starting?

If only this would work 😅

So I did couple experiments just to make sure I didn't really missed something.

For all of them:

However, doing the same thing manually using UI triggers redeploy. My guess is that they changed this behavior with this "Apply 5 changes" functionality being introduced in UI.

P.S. just to be on the same page Since Railway not allows to exec into container, I found following docker image pretty useful to debug the behavior:

Dockerfile:

FROM ubuntu

WORKDIR /usr/src/app
COPY ./printenv_loop.sh .
RUN chmod +x printenv_loop.sh
CMD ["./printenv_loop.sh"]

printenv_loop.sh:

#!/bin/bash

while true; do
  printenv
  sleep 30
done
pksunkara commented 6 months ago

So I did couple experiments just to make sure I didn't really missed something.

To confirm, are you saying that changing the env vars using Terraform is not restarting the services? Or did you do the experiments using UI?

IIRC, changing the env vars in UI need confirmation currently.

wertlex commented 6 months ago

To confirm, are you saying that changing the env vars using Terraform is not restarting the services? Or did you do the experiments using UI?

Minimal reproducible repo: https://github.com/wertlex/terraform-railway-repro

wertlex commented 6 months ago

JFYI: I also checked if changing restartPolicyType in railway.json from ON_FAILURE to ALWAYS will affect this behavior. No, it is not.

pksunkara commented 6 months ago

I won't be able to triage this with the repo you provided until the upcoming weekend. But my current thinking is to ask the railway people whether they have a bug.

wertlex commented 6 months ago

Do you have any connections with them besides their public Discord channel?

If no, I could give it a try in api channel

pksunkara commented 6 months ago

No. I always use their discord API channel.

EggsLeggs commented 6 months ago

I'm also experiencing the exact same issue with env variables being set but the service not redeploying so they're added to the running service.

In my case, this was noticable when setting up a postgres instance which threw errors of env variables not being set and a manual redeploy fixing the issues. https://gist.github.com/EggsLeggs/71d4b409512cdf99640e5ec47c05ef0f

I'll keep an eye on this issue. @pksunkara thank you for all your work on this provider and @wertlex thanks for the fix you put out attaching source images.

wertlex commented 6 months ago

So, just got a response in Discord:

You should be making a separate call to redeploy the service.

So, it turns out it has to be handled on TF provider side.

pksunkara commented 6 months ago

I think it should be fine adding that to every environment variable resource write call, but I wonder if terraform framework supports some kind of mechanism to do it only once after checking if any variables changed

wertlex commented 6 months ago

I wondering how it will work in a situation when 50+ env variables are introduced one by one. But it is rather easy to try.

I wonder if terraform framework supports some kind of mechanism to do it only once after checking if any variables changed I also wondering about this.

Anyway, will try to experiment with all this a bit

pksunkara commented 5 months ago

@wertlex Hey, did you get to try any of the strategies?

wertlex commented 5 months ago

@pksunkara still have this on my todo list, but unfortunately I drown in other ongoing tasks. Though I still have an intension to take a look on that

pksunkara commented 5 months ago

One thing I am thinking of is to create a new railways_variables resource that has a map of all variables and then we simply restart only once.

wertlex commented 5 months ago

It looks like the most reasonable solution. Initially I was thinking about an option of embedding environment variables in service declaration, which at glance makes sense. However, it rises two tricky questions:

  1. how to properly deal with multiple environments in this case. It is resolvable, but most probably will require significant refactoring of well... everything
  2. what exactly to do if/when service creation operation will be completed successfully, but env var creation will fail. Especially considering, that sometimes this kind of things are really happening with Railway
virus2016 commented 4 months ago

This is a must! If you spin up a resource, envs are not applied until post deployment. There is no redeploy on adding of vars so they are not applied. The idea of a new railways_variables is a great shout!

wertlex commented 4 months ago

Hey guys! I'm pretty close to start implementing this, but unfortunately my schedule constantly slips.

Also, I was able to communicate with Railway team, that it would be nice to have an option to provision private docker images (and credentials for private registries) over public API. And I'm also going to take a look on these changes.

pksunkara commented 4 months ago

I think the easiest thing we can do is restart the service whenever we update/create/delete an environment variable.

railways_variables might create more problems regarding the CRUD logic of individual variables.

wertlex commented 4 months ago

My current considerations are the following (though I didn't tried it in practice, though I have some relevant experience):

In my eyes there are actually two problems here which could be killed with one stone:

  1. service restarting/redeploying
  2. performance and rate limiting

Speaking about latter, I have to say that if "performance" part is discussable matter, rate limiting is a real disaster. I'm having a deployment which has ~200 env vars, and I'm able to perform ~2.5 terraform apply within an hour per project, and then I'm facing quota limitation. My hope here is that if I'll be able to provision whole group of ~10 variables (an average we have per service) in 2-3 HTTP requests, whole situation would become way better.

But again, all this is pure theory and my considerations. I'm going to make an experiment with this though. Will keep you updated once there will be any results.

pksunkara commented 4 months ago

I'm having a deployment which has ~200 env vars, and I'm able to perform ~2.5 terraform apply within an hour per project, and then I'm facing quota limitation

Are you hitting the quota just by reading them through the GET endpoint? Because not all of them would be changed and would not hit POST or DELETE

As far as I recalling there is a method to setting multiple vars at one call, though I cannot recall anything about fetching multiple vars in a single call.

There is only one endpoint to read all the variables which is what I use, https://github.com/terraform-community-providers/terraform-provider-railway/issues/8

introduce railway_variables or railway_variable_group which would be well.. just a group of variables.

The tricky part here is that this provider would need to calculate the diff correctly and apply it.

wertlex commented 4 months ago

Are you hitting the quota just by reading them through the GET endpoint? Because not all of them would be changed and would not hit POST or DELETE

Yeah, I believe, it is all due to GETs. terraform apply --refresh=false helps a bit, but not seems to be really futureproof option

There is only one endpoint to read all the variables which is what I use, #8

Thank you, I'll take a look on it

The tricky part here is that this provider would need to calculate the diff correctly and apply it.

I'm almost sure it'll be tricky, but I'd like to give it a try

wertlex commented 4 months ago

Some updates: I've played a bit with API and barely started with provider, but for now it seems doable. Will share a draft as soon as it will be ready

virus2016 commented 4 months ago

Any news?

wertlex commented 4 months ago

Well, yes I have working prototype which does create/read/update/delete/import. But it does not have tests right away and requires a little bit of housekeeping before making it a real PR.

Not much things really left there, but I constantly have other things popping up, so it takes quite a long time.

pksunkara commented 4 months ago

Feel free to create the PR and I can polish it up, add tests and merge.

wertlex commented 4 months ago

Here we go: https://github.com/terraform-community-providers/terraform-provider-railway/pull/25