openshift / ocm-container

Containerized environment for accessing OpenShift v4 clusters, packing necessary tools/scripts
Apache License 2.0
10 stars 63 forks source link

Reorder bashrc.d by type; skip automation if launched without bin; fix env and login #300

Closed clcollins closed 1 month ago

clcollins commented 1 month ago

This PR reorders and reorganizes the bashrc.d files, placing pure aliases and functions low (0-9) and labeling them with "libs", exports and sourcing followign that (10-19) and code execution and automations after that (20-99). It also removes the shebang (#!) shell declaration for sourced files, per convention, and sets shellcheck's shell declaration to allow for continued linting.

This also skips automations (ocm and sre logins) if launching without the ocm-container binary, to allow the SRE to decide what to do.

Additionally, this fixes sourcing the AWS cli completions, which are currently broken in this image.

This also updates the way the OCM authentication is handled, moving authentication to the ocm-container binary. OCM Container will authenticate with OCM using browser-based auth outside the container if the user is not logged in, storing the config in ~/.config/ocm/ocm.json as ocm-cli would. If the user is already logged in and the tokens are not expired, OCM Container will use that. The ocm.json file continues to be mouted read-only, but the OCM_URL environment variable is set inside the container to override the OCM URL config setting in the config file, allowing for OCM Container to be logged into multiple environments at once, and preventing it from overwriting environments outside of the container/in other containers. Environment will change based on the --ocm-url provided (with production being the default).

Finally, this removes to util scripts that have been broken and out of date since backplane was introduced, and who's functions have largely been replaced by OCM and Backplane commands, and fixes a single makefile target that had incorrect syntax.

In practice this should have little impact directly on ocm-contianer users, other than they will now be able to switch environments just by setting the config value after login. Login still defaults to production, and can be overridden by the binary with the OCMC_OCM_URL env variable that is passed in if the user specifies the --ocm-url flag.

Users using the ocm-contianer binary will stay logged into OCM if their ocm.json file exists outside the container, so device auth is only required as their token expires, and would be done manually by the SRE, if that were to happen.

This is partially in support of https://issues.redhat.com/browse/OSD-15847, to allow usage of the ocm-container image elsewhere, and in preparation for changes to the binary to make device auth login easier.

Signed-off-by: Chris Collins collins.christopher@gmail.com

clcollins commented 1 month ago

/hold

For some further discussion

clcollins commented 1 month ago

/unhold

@iamkirkbater @rendhalver @samanthajayasinghe - I think I've fixed, or explained, all the issues here. This is ready for a re-review, if you have some time.

I've updated the PR description with how things exist in the latest commit, not the original commit. Let me know if you have any concerns.

It's worth noting that we should cut a new release after this merges, I think, and encourage everyone to use the latest build image, as the internals of the image changed as well as the external binary.

rendhalver commented 1 month ago

@iamkirkbater @rendhalver @samanthajayasinghe - I think I've fixed, or explained, all the issues here. This is ready for a re-review, if you have some time.

Having a look now. )

It's worth noting that we should cut a new release after this merges, I think, and encourage everyone to use the latest build image, as the internals of the image changed as well as the external binary.

Yep I totally agree with cutting a new release after we merge.

rendhalver commented 1 month ago

Testing notes:: I built the image and binary locally and it looks to be all working if I pass in a cluster-id.

$ ./build/ocm-container -t latest --registry "" --repository localhost --cluster-id c03103eb-1571-498d-b1fd-70587b445faa
Using config file: /home/pebrown/.config/ocm-container/ocm-container.yaml
[snip]
Checking the context on 15uq6splkva07rae2hgih6eph4vs3p8m
Encountered Errors during data collection. Displayed data may be incomplete: 
    cluster is not a HCP/Management Cluster
=================================================
osd-v4stg-aws -- 15uq6splkva07rae2hgih6eph4vs3p8m
=================================================
[snip]
[~ {https://api.openshift.com} (osd-v4stg-aws.z9a9.p1:default)]$ echo $CLUSTER_ID
15uq6splkva07rae2hgih6eph4vs3p8m
[~ {https://api.openshift.com} (osd-v4stg-aws.z9a9.p1:default)]$ echo $CLUSTER_UUID
c03103eb-1571-498d-b1fd-70587b445faa
[~ {https://api.openshift.com} (osd-v4stg-aws.z9a9.p1:default)]$ echo $CLUSTER_NAME
osd-v4stg-aws

I did notice that if we use the cluster name for osd-v4stg-aws it doesn't do all of that but that's because we are searching on the display name in utils/bashrc.d/06-sre-login-libs.bashrc and osd-v4stg-aws has it set to this: Display Name: SRE long lived cluster in production: osd-v4stg-aws So it's a bit of a unique case.

$ ./build/ocm-container -t latest --registry "" --repository localhost --cluster-id osd-v4stg-aws
Using config file: /home/pebrown/.config/ocm-container/ocm-container.yaml
Error: Can't save config file: can't write file '/root/.config/ocm/ocm.json': open /root/.config/ocm/ocm.json: read-only file system
[~ {https://api.openshift.com} (osd-v4stg-aws.z9a9.p1:default)]$ echo $CLUSTER_ID

[~ {https://api.openshift.com} (osd-v4stg-aws.z9a9.p1:default)]$ echo $CLUSTER_UUID

[~ {https://api.openshift.com} (osd-v4stg-aws.z9a9.p1:default)]$ echo $CLUSTER_NAME

[~ {https://api.openshift.com} (osd-v4stg-aws.z9a9.p1:default)]$ 

I am not sure how to fix the prompt if we aren't using an ENV var set to the short url. Kirk's idea of setting the short-url before we set the prompt is a nice solution.

rendhalver commented 1 month ago

I am going to put a hold on this and lgtm it. Feel free to unhold other people are happy with it.

/hold /lgtm

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clcollins, rendhalver

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/ocm-container/blob/master/OWNERS)~~ [clcollins,rendhalver] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
clcollins commented 1 month ago

I did notice that if we use the cluster name for osd-v4stg-aws it doesn't do all of that but that's because we are searching on the display name in utils/bashrc.d/06-sre-login-libs.bashrc and osd-v4stg-aws has it set to this: Display Name: SRE long lived cluster in production: osd-v4stg-aws

Ah, good catch @rendhalver ! We're using display_name in that query, but most clusters have a longer description in that field. It looks like ocm-cli actually uses name when it does the same lookup:

https://github.com/openshift-online/ocm-cli/blob/main/pkg/cluster/cluster.go#L247

Using "name" instead returns the info for osd-v4stg-aws as expected:

ocm list clusters '--parameter=search=((id like '\''osd-v4stg-aws'\'' or external_id like '\''osd-v4stg-aws'\'' or name like '\''%osd-v4stg-aws%'\''))' --columns 'id, external_id, name' --no-headers
REDACTED  REDACTED  osd-v4stg-aws 

That has probably always been broken. I'm happy to put in a bug fix in a follow up PR, if that's cool with you.

clcollins commented 1 month ago

Actually, @rendhalver, I just went ahead and fixed that: https://github.com/openshift/ocm-container/pull/301

rendhalver commented 1 month ago

That has probably always been broken. I'm happy to put in a bug fix in a follow up PR, if that's cool with you.

Oh I am totally happy getting that sorted after this one is merged. :)

rendhalver commented 1 month ago

/unhold