phraenquex commented 2 years ago

Needs working out of tokens via Keycloak etc.

ag-m2ms commented 2 years ago

From my playing around I think the issue is the combination of Keycloak configurations, especially the client_id and redirect_uri, which seems to be provided to the backend as environment variables. I think that the data the logged in user sees is based on the client_id. I think there's a list of client IDs and for each ID in that list a list of allowed redirect URIs is specified. What could help is allowing for each deployment (prod, boris, tibor, etc.) to redirect to 127.0.0.1:8080, if that's possible. This is all just my thoughts and speculations though, so I guess input from @tdudgeon would be helpful.

Since on login FE redirects to BE, which then composes the login URL and redirects to Keycloak, we need a mechanism to tell BE to use redirect_uri either provided by FE or automatically provide it when a flag (could be a URL param) is specified.

tdudgeon commented 2 years ago

Yes, in Keycloak we register a set of clients, each has a client_id. When you request a token you specify the appropriate client_id. When a browser makes the authentication request is specifies a redirect_uri to which keycloak will redirect the browser once the user is logged in. If this URI is incorrect things will not work correctly. Each client also has a list of URI patterns that it considers as valid redirect URIs to ensure that only accepted sites can authenticate through this mechanism (though a wildcard can be used to open this up)..

In the DEV keycloak we currently have clients for each of the developer stacks plus one called fragalysis-local. That one has 127.0.0.1 (localhost) as a valid redirect_uri pattern. But if front end running locally (e.g. on localhost) is using a particular stack backend (e.g. boris stack) then the backend will be authenticating assuming that it's corresponding frontend (e.g. https://fragalysis-boris-default.xchem-dev.diamond.ac.uk/) is being used and should be the redirect_uri.

The answer is probably, as Anton says, to allow a front end to specify the redirect_uri to use when it makes the request to authenticate to the backed, or maybe this can be inferred from the request. Either way this is not particularly secure, and certainly should not be enabled on the staging and prod stacks.

Also, its probably time to rationalise all these stacks and clients. I imagine we don't need the tibor and rachael ones, and maybe not the duncan one either now.

Probably we need to have a brief chat to go over all this and make sure we understand what the backend is doing during this process (Duncan implemented this so I don't thing anyone knows exactly what's going on).

phraenquex commented 2 years ago

New ticket #914 for rationalising stacks - so not relevant to this ticket. For the token stuff, @ag-m2ms, @alanbchristie and @tdudgeon will discuss offline.

tdudgeon commented 2 years ago

@ag-m2ms @boriskovar-m2ms Can you detail exactly what is broken here so that we solve the right problem?

You are working with a front end deployment on your laptop, accessing a backend running in the cluster? Are you able to authenticate at all in this scenario or is this a problem specific to wanting to run Squonk jobs??

boriskovar-m2ms commented 2 years ago

@tdudgeon @alanbchristie Image above describes situation when I want to try connect local FE to remote BE but authenticating as a local client. In this case I'm authenticated and have user ID like 6 or something but when I login directly in the FE which is deployed to my stack I have user ID like 30 or something. I can create a new project and create snapshots and all the stuff the authenticated user can do but when I want to trigger jobs I get 403 because Squonk doesn't know user with id 6 nor that user is added to the squonk project.

Image above describes situation when after small code change I can authenticate as remote client but then I'm redirected to remote FE so I'm not able to debug the local FE.

So maybe it will be just enough add my user from local client to squonk and squonk project?

alanbchristie commented 2 years ago

wrt sketch 1 the suspicion is that the local stack, launched via the project docker-compose.yml file, is forced to use the backend's "default" keycloak client ID. This is because the compose file does not define a suitable OIDC_RP_CLIENT_ID. Under these conditions backend uses fragalysis-local as the ID. Any access token obtained for this (local) client are not valid for the remote stack.

The first step, for sketch 1, is to define values in the docker-compose.yaml for OIDC_RP_CLIENT_ID along with a correspondingly suitable OIDC_RP_CLIENT_SECRET.

I'll keep the secret out of this conversation but...

Add a cluster-compliant client, e.g. OIDC_RP_CLIENT_ID: fragalysis-tibor-default-xchem-dev

But...

I worry that the Squonk Job status callbacks will not execute successfully as Squonk will not be able to make calls into to your local stack (it's localhost). Anyway - first step is to try setting appropriate ENV variables in the compose file.

alanbchristie commented 2 years ago

And, after further testing I suspect there are fundamental issues with this approach that mean it cannot work. The following diagram helps illustrate one problem...

The LOCAL development environment (instantiated via docker-compose YAML file) consists of a stack container and a database. Crucially the db holds all the stack/django objects. The REMOTE environment (kubernetes) consists of the stack under test, its database, and a keyclock service.

What's happening?

A

The LOCAL stack authenticates with the REMOTE keyclock (as user "Bob") using a client ID that's shared with the REMOTE stack. (That's fine - there's nothing immediately wrong with this)
In return keyclock issues an Access TOKEN for the user

B

The LOCAL stack keeps the token and the USER object (in the local DB) is updated to indicate the user is AUTHENTICATED

C

The LOCAL stack now calls an API endpoint on the REMOTE stack (backend) using the TOKEN provided. That's initially OK ... the TOKEN is valid for the REMOTE stack as they use the same keycloak and client ID
The REMOTE backend (django) decodes the TOKEN, checking that the USER (in its database) is AUTHENTICATED. It does this because you're using an endpoint that can only be used by "logged in" users.
The REMOTE backend discovers that the USER is not AUTHENTICATED (according to the REMOTE db) and so django issues a Forbidden/403 response

How do we fix it?

At the moment, I'm not sure you can. If you think about it ... you are logging in using a LOCAL stack and expecting a REMOTE stack state (database) to mirror yours. It wont. You can call unauthenticated endpoints but not endpoints that check a user's logged-in state.

Why not disable the authentication?

You can, but that's a significant change and how do you control it? Via an environment variable and a reboot of the stack?

You also open up the entire functionality of the REMOTE stack to anyone and everyone.

Remove the LOCAL backend/stack/database?

Can you run the local code without the backend container or a LOCAL database?

If you can you might be nearly there ...

... but ... how does the REMOTE backend call-back/redirect to your local stack? It can't make any calls to localhost so you will (at the very least) need a public IP address or hostname.

Even then it might not work.

Debug from within the REMOTE stack?

Mmmm - could this be the only "practical" solution? Not sure.

boriskovar-m2ms commented 2 years ago

Just very random thought. What if I use local front-end, local back-end but with ability to connect to remote db and remote graph?

tdudgeon commented 2 years ago

In theory that should work, but access to those DBs would need to be provided.

Either: open up the ports (including the STFC firewall) which would be cumbersome, as well as unwise Or: Use K8S port forwarding (e.g. using kubectl) which would be easier and more secure, but those connections do get dropped after some time so this might not be so stable a setup.

Either way:

you would need to manage target loading into your local back end
how the local back end would be synced with the PostgreSQL db. Probably this requires a dedicated db otherwise the db would be serving multiple backends (its real one and any local developer backends) and so would quickly become inconsistent.
Performance profile would be different - the slow connection is now between the backend and the db rather than the front end and the backend. But this is probably not a problem.

Might be simpler to also run the dbs locally. Just extra services in your docker-compose file.

Also, Squonk job execution would not work as the callback mechanism is to the backend, and a backend of localhost would not work. You would need your laptop to have a public IP and be accessible from the internet.

alanbchristie commented 2 years ago

The Fragalysis application relies on the built-in authentication provided by Django. This is expected, it is simple, it is built-in. But the remote debug feature exposes the fact that the f/e and b/e use separate databases (where the user records are also managed). We could inspect the inbound token rather than rely on django. If the Token's valid, then let the action complete.

But ... I still fear that the User IDs in the databases will be different and this may cause other problems. The objects created in the remote django app will be created with ownership that differs from the local app (even though it's the same user name - their internal django IDs will be different). I suspect that the local and remote will probably need access to the same DB, requiring us to expose the internal DB in the remote stack to the local backend.

phraenquex commented 2 years ago

After discussion today:

Authentication problems indicate that this is the wrong approach
Connection to remote DB (@tdudgeon 's suggestion) seems the right approach
@alanbchristie and @boriskovar-m2ms to discuss before next meeting
@phraenquex asks, this ticket is not "done" until the solution is fully and robustly documented

phraenquex commented 2 years ago

For release, needs documentation (and review of docs) of all the things needed. @Waztom

phraenquex commented 1 year ago

@boriskovar-m2ms feels it works, now it's just a matter of documenting it. Which is probably an @alanbchristie thing.

m2ms / fragalysis-frontend

Remote debugging of job launching #912

What's happening?

A

B

C

How do we fix it?

Why not disable the authentication?

Remove the LOCAL backend/stack/database?

Debug from within the REMOTE stack?