sostheim / krak8s

API Service for Kraken and Kubernetes Commands
Apache License 2.0
1 stars 5 forks source link

clusters not deploying on in-cluster krak8s API #5

Closed jshimko closed 7 years ago

jshimko commented 7 years ago

As mentioned in Slack last night, the API currently does not work. Just opening an issue to track progress.

I’ve followed the readme examples for the API and I don’t seem to be able to actually create a new cluster. Here are my exact requests/responses (run from inside the Launchdock pod within the cluster). The created cluster never leaves the create_requested state and no new nodes have been created.

# create project
curl -X POST -H "Content-Type: application/json" -d '{"name":"acme"}' http://10.37.8.171:8080/v1/projects

{
  "created_at": "2017-08-29T00:03:31.043193048Z",
  "id": "f614ab6e",
  "name": "acme",
  "namespaces": null,
  "type": "project"
}

# create namespace
curl -X POST -H "Content-Type: application/json" -d '{"name":"acme-prod"}' http://10.37.8.171:8080/v1/projects/f614ab6e/namespaces

{
  "applications": null,
  "created_at": "2017-08-29T00:06:07.85514329Z",
  "id": "298f91ef",
  "name": "acme-prod",
  "resources": null,
  "type": "namespace"
}

# create cluster
curl -X POST -H "Content-Type: application/json" -d '{"namespace_id":"298f91ef", "nodePoolSize": 5}' http://10.37.8.171:8080/v1/projects/f614ab6e/cluster

{
  "created_at": "2017-08-29T01:07:28.912574145Z",
  "id": "b0ef89bd",
  "namespace_id": "298f91ef",
  "nodePoolSize": 5,
  "state": "create_requested",
  "type": "Resource",
  "updated_at": "0001-01-01T00:00:00Z"
}

# note that cluster is sitting in "create_requested" state 30 mins later
curl http://10.37.8.171:8080/v1/projects/f614ab6e/cluster/b0ef89bd

{
  "created_at": "2017-08-29T01:07:28.912574145Z",
  "id": "b0ef89bd",
  "namespace_id": "298f91ef",
  "nodePoolSize": 5,
  "state": "create_requested",
  "type": "Resource",
  "updated_at": "0001-01-01T00:00:00Z"
}
sostheim commented 7 years ago

@jshimko - there are couple of things that we are working to resolve that affect the functioning of the API service. 1) the api requires that the config.yaml file for the cluster be annotated with a pair of marker strings that indicated to the service where the automation entries belong in the file. 2) the container image for the api service requires that several environment variables be passed in to the container so that it can access the cluster state resources (local) and the AWS resources required (cloud based).

sostheim commented 7 years ago

With respect to issue 1) the procedure for annotating the config.yaml is documented here: https://github.com/samsung-cnct/krak8s#kraken-configuration-file-integration but we will work through this to validate the configuration file is setup correctly.

sostheim commented 7 years ago

With respect to 2) we have created a new AWS user for the reaction commerce account krakenbot that we will use as the credentialed user for the AWS interaction. @joejulian is currently working on getting the rest of the environment in to the appropriate helm chart configuration for re-deployment to your cluster.

sostheim commented 7 years ago

PR Merged: https://github.com/reactioncommerce/infrastructure/pull/5

sostheim commented 7 years ago

@joejulian PR Merged: https://github.com/reactioncommerce/chart-krak8s-api/pull/1/

jshimko commented 7 years ago

👍 Let me know when things are working and I'll give it another test.

sostheim commented 7 years ago

We are working on reinstalling the chart with the required values now, will provide an update as soon as we have something testable.

sostheim commented 7 years ago

Commit: https://git.launchdock.io/reactioncommerce/krak8s/commit/d2d2f43e0a60ff13ca0f4bd6ba4ccc37b175be96

sostheim commented 7 years ago

Several changes to: 1) allow git cli commands to behave correctly w/o a global config https://github.com/samsung-cnct/git-archivist/commit/d07a35d3bb0927681b51deb0b2d1b51d70a229b1 2) separate git-archivist functions in to two logical components

sostheim commented 7 years ago

The error tuned out to be in the terraform state managed by Kraken in the cluster manifest. The issue was resolved in a series of issues and PR'ed commits listed here: https://github.com/samsung-cnct/k2/issues/793, https://github.com/samsung-cnct/k2/pull/795, https://github.com/samsung-cnct/k2/pull/796