unity-sds / unity-cs

Unity Common Services
Apache License 2.0
0 stars 1 forks source link

Rework SPS Marketplace installation to use EKS/httpd #351

Open galenatjpl opened 4 months ago

galenatjpl commented 4 months ago

The current state of the SPS marketplace deployment is very out of date. The following needs to be updated:

https://github.com/unity-sds/unity-sps

Other random notes:

Run TF script with var that defines airflow port.

Then Airflow will be reachable from MCdomainname/sps/airflow. Got o testing httpd proxy on Ryan's own laptop, and reaching a host airflow that he had deployed. The lambda gets triggered, but had an error.

Next step for Brad would be to talk to Drew about getting airflow deployment, take load balancer name/port and run the httpd proxy terraform script to put the config on the httpd proxy, then see if its reachable.

The other thing that may have got lost in repo transition -- AF has config option that can be passed in at config time called BASEURL. Need to set to URL that's serving Airflow instance.

Dependency for:

jpl-btlunsfo commented 3 months ago

Being blocked by differing versions of terraform in the unity-management console (here), and requirements in the unity-sps repo (here and here).

Looking into updating unity-management-console's cloudformation templates to pull a newer terraform.

galenatjpl commented 3 months ago

@jpl-btlunsfo We should probably just update all the places, to use the highest (most recent stable) version of terraform. That's probably what you are already doing. I think this will become an interesting maintenance issue going forward, and we should discuss ways to mitigate this. Perhaps I'll bring it up to the whole team.

jpl-btlunsfo commented 2 months ago

using this branch for now to work-around (overriding during test runs cloudformation branch in the nightly_test/run.sh) https://github.com/unity-sds/cfn-ps-jpl-unity-sds/tree/btlunsfo-update-terraform-test

jpl-btlunsfo commented 2 months ago

just an update per work on sps-eks, I've finally cleaned things up enough that the management console is able to run the terraform configuration for the sps-eks module, and it is able to create the EKS cluster. However, I'm seeing some issues post-create where later terraform steps (helm references, etc) are having issues connecting and configuring the cluster further. Still investigating.

[ERROR] Error: Kubernetes cluster unreachable: Get "https://7AE228734033540E79AA2552C14882D4.sk1.us-west-2.eks.amazonaws.com/version": dial tcp 10.6.51.219:443: i/o timeout
[ERROR] 
[ERROR]   with module.spseks-qcdUP.module.unity-eks.helm_release.aws-load-balancer-controller,
[ERROR]   on .terraform/modules/spseks-qcdUP.unity-eks/terraform-unity-eks_module/main.tf line 423, in resource "helm_release" "aws-load-balancer-controller":
[ERROR]  423: resource "helm_release" "aws-load-balancer-controller" {
[ERROR] 
[ERROR] 
[ERROR] Error: Post "https://7AE228734033540E79AA2552C14882D4.sk1.us-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/serviceaccounts": context deadline exceeded
[ERROR] 
[ERROR]   with module.spseks-qcdUP.module.unity-eks.kubernetes_service_account.aws-load-balancer-controller-service-account,
[ERROR]   on .terraform/modules/spseks-qcdUP.unity-eks/terraform-unity-eks_module/main.tf line 445, in resource "kubernetes_service_account" "aws-load-balancer-controller-service-account":
[ERROR]  445: resource "kubernetes_service_account" "aws-load-balancer-controller-service-account" {
[ERROR]
galenatjpl commented 1 month ago

re-assigning to sprint 24.2.2

GodwinShen commented 1 month ago

@galenatjpl and @jpl-btlunsfo ping for status

galenatjpl commented 1 month ago

@GodwinShen , @jpl-btlunsfo is trying to work around some Helm/cluster issues. Terraform is currently failing, and Brad is looking into it. Brad has a good idea of what to investigate/try next. SPS requires a much newer EKS than is in the Marketplace. The hope is that by the end of this week, the EKS issues will be sorted out, but NOT the full SPS deployment. For the SPS deployment, a webserver password (driven by a terraform variable) is needed. This needs to be reworked to be fetched by SSM. This is needed by Airflow. Hopefully the SPS part can be finished next sprint (24.2.3).

GodwinShen commented 1 month ago

@galenatjpl and @jpl-btlunsfo ping for status

galenatjpl commented 1 month ago

Hi @GodwinShen , @jpl-btlunsfo and I met yesterday, and decided that pushing forward some of the HTTPD work is the top priority in the short-term. As such, for the next few working days @jpl-btlunsfo will be focusing on https://github.com/unity-sds/unity-cs/issues/403. I think good progress has been made on this ticket (https://github.com/unity-sds/unity-cs/issues/351), but I'll let @jpl-btlunsfo weigh in with any updates since his message above, last week. I think we should be back to work on this ticket in early June.

galenatjpl commented 1 month ago

@GodwinShen @mike-gangl @jpl-btlunsfo per my comment above, I've moved this to 24.2.4

galenatjpl commented 1 week ago

Moving to 24.3