Closed adolski closed 11 months ago
They are--there are three main steps to this:
rails-container-scripts
submodule used to build only x86 images until I changed it to dual-build x86 and ARM images. We would want to change it again to build only ARM images, which would basically just require reverting those changes, I think. This step is really optional, but if we aren't using x86 images anymore then we shouldn't build them, because it makes the build take longer.rails-container-scripts
, we would need to change its task definitions as well and rebuild/redeploy it.This issue would be a good way to learn more about how ECS and Terraform work. It's small, but deep.
Thanks. I found this resource for working with ARM workloads in AWS, and they lay out several ways to configure ARM CPU architecture for ECS task definitions (including using aws cli
).
So I'd essentially want to include something like:
{
"runtimePlatform": {
"operatingSystemFamily": "LINUX",
"cpuArchitecture": "ARM64"
},
...
}
But instead of using aws cli
or a different interface to configure the task definitions, I'd want to change/add a resource via Terraform script(s) in the repos you linked. Do I have that right?
That is correct. I believe the section of the terraform script that needs to be changed is the aws_ecs_task_definition
in main.tf
: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/ecs_task_definition#cpu_architecture
I cloned both repos and installed terraform via homebrew
on my machine. Do I need to run any commands before I update any code? ie:
$ terraform init
Right now if I run that I get the following error:
Error: validating provider credentials: retrieving caller identity from STS: operation error STS: GetCallerIdentity, https response error StatusCode: 403, RequestID: 6dff0062-21a9-407b-8ccf-6bdaa6fe4b46, api error ExpiredToken: The security token included in the request is expired
Are you aws login
ned?
I shared a Box folder containing secrets.tfvars
files you'll need.
I haven't used Terraform myself in a long time, but terraform init
sounds right. After that, terraform plan
will show you what it's going to do, and terraform apply
will do it.
If terraform plan
shows a lot of changes, that probably means that the scripts are out of sync with the resources in AWS. But hopefully that isn't the case.
Edited to Add:
I was able to get terraform init command to work by running rm -f .terraform.lock.hcl
, and then running terraform init
:
After making sure I was logged in with aws and included the secrets.tfvars
file,
When I run terraform init
I get:
Initializing the backend...
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
- Reusing previous version of hashicorp/template from the dependency lock file
- Installing hashicorp/aws v4.52.0...
- Installed hashicorp/aws v4.52.0 (signed by HashiCorp)
Error: Incompatible provider version
Provider registry.terraform.io/hashicorp/template v2.2.0 does not have a package available for your current platform, darwin_arm64.
Provider releases are separate from Terraform CLI releases, so not all providers are available for all platforms. Other versions of
this provider may have different platforms supported.
I looked into this issue and found a solution that worked for others who received the same error, and ran the following commands:
arch -arm64 brew install kreuzwerker/taps/m1-terraform-provider-helper
m1-terraform-provider-helper activate
m1-terraform-provider-helper install hashicorp/template -v v2.2.0
which then gave me:
Successfully installed hashicorp/template v2.2.0
When I run terraform init
again, I get:
Error: Failed to install provider
Error while installing hashicorp/template v2.2.0: the local package for registry.terraform.io/hashicorp/template 2.2.0 doesn't match any of the checksums previously recorded in the dependency lock file (this might be because the available checksums are for packages targeting different platforms)
So then I ran this command:
terraform providers lock -platform=linux_amd64 -platform=darwin_amd64
This gave me a successful message:
- Fetching hashicorp/template 2.2.0 for linux_amd64...
- Retrieved hashicorp/template 2.2.0 for linux_amd64 (signed by HashiCorp)
- Fetching hashicorp/aws 5.21.0 for linux_amd64...
- Retrieved hashicorp/aws 5.21.0 for linux_amd64 (signed by HashiCorp)
- Fetching hashicorp/aws 5.21.0 for darwin_amd64...
- Retrieved hashicorp/aws 5.21.0 for darwin_amd64 (signed by HashiCorp)
- Fetching hashicorp/template 2.2.0 for darwin_amd64...
- Retrieved hashicorp/template 2.2.0 for darwin_amd64 (signed by HashiCorp)
- Obtained hashicorp/template checksums for linux_amd64; This was a new provider and the checksums for this platform are now tracked in the lock file
- Obtained hashicorp/template checksums for darwin_amd64; This was a new provider and the checksums for this platform are now tracked in the lock file
- Obtained hashicorp/aws checksums for linux_amd64; This was a new provider and the checksums for this platform are now tracked in the lock file
- Obtained hashicorp/aws checksums for darwin_amd64; This was a new provider and the checksums for this platform are now tracked in the lock file
Success! Terraform has updated the lock file.
Review the changes in .terraform.lock.hcl and then commit to your
version control system to retain the new checksums.
But after committing and running terraform init
again, I get the same error as before `(Failed to install provider)
I'll keep digging on how to resolve this, but welcome to any suggestions!
Two updates:
I sent up a PR for updating the aws demo service
using Terraform. If everything looks fine to you, I'll do the same thing for the prod service
I pushed up a new commit to the master branch
of rails-container-scripts
submodule that removes 'linux/amd64' in the buildx instruction.. If this is not what you had in mind I'll revert back to what it was before and make any necessary changes.
Looking good so far! Have you tried a build & deploy of an ARM Book Tracker image yet?
Let's get the demo environment fully migrated before moving onto production.
Not yet! I'll give it a go and get back to you.
Demo service keeps failing and starting/stopping.
I first ran the redeploy.sh demo
script and saw the following errors in the aws logs:
I thought maybe I needed to build first and then deploy, so I ran docker-build.sh demo
followed by ecs-deploy-webapp.sh demo
. Everything looked fine on my machine (no errors with building image/pushing to aws). But the deploy failed again and the logs showed:
Okay, I think I know what's wrong. The Book Tracker's task definition actually defines two containers:
The Book Tracker container is probably fine, but the other one is still x86, thus the "exec format error."
Unfortunately the architecture part of the task definition applies to all of its containers and can't be applied to just one.
So I guess we have a few options:
omniauth-shibboleth
in the Book Tracker with omniauth-saml
(will require coordinating with iTrust)
I think (2) is the best long-term option and I'm sure that Library IT (who wrote the Apache image builder tool) would appreciate it.
But whether or not we attempt (2) right now, we need to do (1) and revert the changes made thus far.
Ahh, okay. I was wondering why I was seeing different error messages than previously in the logs for the shib-frontend container
. This makes more sense now.
Confirming I've reverted the changes so now the rails-container-scripts/docker-build.sh
includes x86_64
again, and the Terraform script in the aws-book-tracker-demo-service
no longer specifies arm64
in the main.tf
file (I also re-ran terraform apply
command)
Demo service is back up and running again:
Great! I will hopefully be able to look into the omniauth stuff soon. I've created #30 to track it.
@gaurijo Neither of the Book Trackers are using that x86 container anymore, so you should be able to proceed now.
(Make sure to pull the latest terraform code in aws-book-tracker-demo-service and aws-book-tracker-prod-service)
I also pulled down the latest code from develop
branch. When I run the tests I'm seeing some errors I hadn't seen before - are these expected or did something go wrong on my end?
I haven't seen those errors before. They would be stemming from this change: https://github.com/medusa-project/book-tracker/issues/32#issuecomment-1779900140
I think I found a bug in TempStore.client_options()
. Try pulling the latest code and try again.
I pulled down the latest code but I'm still getting the same errors.
On the other hand, I implemented changes to the aws_book_tracker_demo_service
script, updated the rails-container-scripts docker buildx command, and the redeploy of demo was successful.
Excellent. You can probably move onto production now.
I don't know what that error is. Can you do either of these:
% bin/rails console
Loading development environment (Rails 7.1.1)
irb(main):001> TempStore.instance.bucket_exists?
=> true
% bin/rails console -e test
Loading test environment (Rails 7.1.1)
irb(main):001> TempStore.instance.bucket_exists?
=> true
Is minio running and is there anything interesting in the log? Are your config/credentials/development.yml
and test.yml
files correct, particularly the storage
section?
I'm able to get a true
output with both rails console environments. My config/credentials
all seem correct as well.
When I try accessing minio however, I get blocked with the following error:
I'll keep digging around why these tests are failing on my end, but going to close this issue since everything is migrated now
Are the docker containers currently running on
x86_64
?I'm happy to get started on this issue - please let me know if there is additional context I should have first