owncloud / ocis

:atom_symbol: ownCloud Infinite Scale Stack
https://doc.owncloud.com/ocis/next/
Apache License 2.0
1.36k stars 180 forks source link

OCIS fails to start after upgrade from 5.0.0-rc.5 to 5.0.0-rc.6 #8669

Closed mcarbonne closed 3 months ago

mcarbonne commented 6 months ago

Describe the bug

After upgrading from 5.0.0-rc.5 to 5.0.0-rc.6, ocis wasn't starting anymore :

ocis[2406802]: The service account id has not been configured for ocm. Make sure your /etc/ocis config contains the proper values (e.g. by running ocis init or setting it ma....

First, I tried first to generate a new ocis.yaml but I ran into errors (unable to login anymore, see below).

Then I managed to start 5.0.0-rc.6 by manually adding this (to my original config):

ocm:
  service_account:
    service_account_id: XXXXXXXX
    service_account_secret: XXXXX

(I copy-pasted this section from the newly generated ocis.yaml)

Note: rolling-back to 5.0.0-rc.5 was also working. Also, my config file was generated a long time ago (most likely with 5.0.0-alpha1).

Steps to reproduce

  1. Run ocis 5.0.0-rc.5
  2. Update image to ocis 5.0.0-rc.6
  3. The container does not start anymore

Expected behavior

I've tested almost all 5.0 alpha/beta/rc without any issues. I was expecting two things :

  1. First, when removing ocis.yaml, a newly generated ocis.yaml shall be OK. I'm running ocis with keycloak and all OCIS parameters are setup using environment variables.

But in my setup, I ran into a lot of troubles after regenerating ocis.yaml:

2024-03-16T21:43:16Z ERR failed to add user error="LDAP Result Code 49 \"Invalid Credentials\": " request-id=XXXXX-000099 service=graph
2024-03-16T21:43:16Z ERR could not create user: backend error error="generalException: failed to add user" request-id=XXXXX-000099 service=graph
2024-03-16T21:43:16Z ERR Error creating user error="500 Internal Server Error" service=proxy
2024-03-16T21:43:16Z ERR Autoprovisioning user failed error="500 Internal Server Error" service=proxy
2024-03-16T21:43:16Z ERR invalid credentials bind_dn=uid=libregraph,ou=sysusers,o=libregraph-idm op=bind remote_addr=127.0.0.1:41032 service=idm
2024-03-16T21:43:16Z ERR Bind failed error="LDAP Result Code 49 \"Invalid Credentials\": " service=graph
2024-03-16T21:43:16Z ERR failed to add user error="LDAP Result Code 49 \"Invalid Credentials\": " request-id=XXX-000100 service=graph
2024-03-16T21:43:16Z ERR could not create user: backend error error="generalException: failed to add user" request-id=XXX-000100 service=graph
2024-03-16T21:43:16Z ERR Error creating user error="500 Internal Server Error" service=proxy
2024-03-16T21:43:16Z ERR Autoprovisioning user failed error="500 Internal Server Error" service=proxy

The only solution was to rollback and re-use the previous one. What is the underlying limitation ? Why re-generating a new ocis.yaml do not work ?

Note: to be sure this wasn't related to my setup, I tried to do this :

  1. Secondly, when upgrading ocis, I was expecting automatic migration of configuration file. Is automatic config file migration a feature ? How to trigger it ? (ocis init seems to do nothing if a configuration file already exists).

Setup

I'm running latest ocis 5.0.0 rc6 (podman) with keycloak SSO. My setup is very similar to https://owncloud.dev/ocis/deployment/ocis_keycloak/

micbar commented 6 months ago

But in my setup, I ran into a lot of troubles after regenerating ocis.yaml: Oh! This is not supported. We cannot destroy the existing ocis.yaml. There are randomized secrets in that. Regenerating them will break the system.

Please check your process by comparing to the upgrade docs. https://doc.owncloud.com/ocis/5.0/migration/upgrading-ocis.html

Secondly, when upgrading ocis, I was expecting automatic migration of configuration file.

We have a not so clear situation here. It is a known DevOPS paradigm to not change config files "automagically" by the installed application.

In the future we can also have a maintenance command that can do an incremental upgrade "on demand".

prohtex commented 5 months ago

Same here using WOPI deployment. I am stuck on rc5.

micbar commented 5 months ago

Please Check the upgrade guide in the docs.

you need to add service account id and secret to your config.

jacobgkau commented 5 months ago

My instance won't start because the service account isn't defined after upgrading from 4.x to 5.x. I tried adding a section like this to my ocis.yaml like @mcarbonne did:

ocm:
  service_account:
    service_account_id: XXXXXXXX
    service_account_secret: XXXXX

...but it didn't work. The documentation page here suggests simply putting it in a top-level section (although it seems to be for a master.yaml and not ocis.yaml):

service_account:
  service_account_id: ""
  service_account_secret: ""

...which also doesn't work. I'm running OCIS as a single Podman container using the official Docker image, and I don't want to have to specify secrets as environment variables in my systemd service that starts the continer. What is the proper format to configure this on-disk in ocis.yaml? If it's not going to be upgraded "automagically," then it ought to at least be documented how to configure it instead of just saying "configure it."

Edit: I also tried ocis: instead of ocm: for the top level, and I tried calling the bottom level id: and secret:, none of which made any difference; using or not using quotation marks around the UUID and string also made no difference. The instance started fine when I added the environment variables in all-caps format to my systemd service unit file, but I'd still like to move them out of there and into the file with all of the other OCIS configuration.

kulmann commented 5 months ago

@jacobgkau the (identical) service account configuration is required for all services that need a service account. greping a fresh ocis.yaml file I have the following occurrences (see code block below). That's generated from ocis init. I have no clue if a service account config on the root level is possible. If yes, I'd expect it starting with service_account:, not ocis: as the root level key in the ocis.yaml file. Again, no idea if that is supposed to work. The global env variables (starting with OCIS_) are just a shorthand for setting the env vars of all services at once.

I'm including the id and secret here, because this is only a local dev instance that get's killed on a regular basis anyway. Please don't use the two values in a production instance but generate your own values.

graph:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
--
proxy:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
--
frontend:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
--
search:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
--
storage_users:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
--
notifications:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
--
userlog:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
--
auth_service:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
--
clientlog:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6

And one more service account related config that I found in the ocis.yaml file:

settings:
  service_account_ids:
  - 19dff10e-6694-4a48-a698-5c7c8a4a5008
micbar commented 5 months ago

@kulmann Thanks! That explains it very well.

More Info

Env variables in ocis are very convenient, especially if there are OCIS_ prefixed vars which are used in more than one service. The service account id and secret are a good example.

I think we can work on an ocis init --upgrade command to achieve a migration of an old ocis.yaml to the new format.

Relates to #3645

jacobgkau commented 5 months ago

Env variables in ocis are very convenient, especially if there are OCIS_ prefixed vars which are used in more than one service. The service account id and secret are a good example.

I fail to see why global configuration in the configuration file isn't possible when it is possible via environment variables. It can't be a security best-practices thing (e.g. giving that id/secret to services that don't need it), because if it was that, then the environment variables would have the same issue.

kulmann commented 5 months ago

Env variables in ocis are very convenient, especially if there are OCIS_ prefixed vars which are used in more than one service. The service account id and secret are a good example.

I fail to see why global configuration in the configuration file isn't possible when it is possible via environment variables. It can't be a security best-practices thing (e.g. giving that id/secret to services that don't need it), because if it was that, then the environment variables would have the same issue.

I just tried it myself: in the config file the service account needs to be configured on each and every service individually. There is no shorthand in the config file (= defining a value on root level to make it apply to all services, as an equivalent of the OCIS_ env var pattern). I also see no reason for that - other than "nobody thought of it, yet". But maybe I'm wrong and it's somewhat hard to achieve in the config files... @micbar ?

the-hotmann commented 5 months ago

the (identical) service account configuration is required for all services that need a service account.

I had this problem aswell, but ifxed it myself. But now as I see others have this problem aswell and checking over the yaml file I in generall wondered why it is made so complicated.

Currently:

graph:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
proxy:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
frontend:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
search:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
storage_users:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
notifications:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
userlog:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
auth_service:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6
clientlog:
  service_account:
    service_account_id: 19dff10e-6694-4a48-a698-5c7c8a4a5008
    service_account_secret: sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6

How I think it should work aswell (according to this docs):

define: &service_account
  service_account_id "19dff10e-6694-4a48-a698-5c7c8a4a5008"
  service_account_secret "sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6"

graph:
  *service_account
proxy:
  *service_account
frontend:
  *service_account
search:
  *service_account
storage_users:
  *service_account
notifications:
  *service_account
userlog:
  *service_account
auth_service:
  *service_account
clientlog:
  *service_account

Or why are there not just two variables (env vars if you want) for ALL Services:

service_account_id "19dff10e-6694-4a48-a698-5c7c8a4a5008"
service_account_secret "sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6"

or

service_account:
  service_account_id "19dff10e-6694-4a48-a698-5c7c8a4a5008"
  service_account_secret "sKrUn-e@w6Mn9N0XL=gUHegt%nDQ9!C6"

And each service just reads in the config-variable it needs. (all take both, but "settings" just needs the service_account_id)

But the way it currently is, is just repetitive config.

The last approach with only the two variables for all services IMHO is the best, as the maintainers just broke a lot of installations when they updated (not upgraded!) from rc5 to rc6. But with the 3rd approach even every new service would also get these two variables and would not require to have the config altered/upgraded.

micbar commented 5 months ago

The reason why that is that complicated is because we have a microservice architecture. All services are not aware of internals of other services.

The ocis.yaml is just a hierarchical shortcut. It combines the yaml files of each individual services. There is no "shared" config for yaml files. The link from geoserver is IMHO custom implementation. We are marshalling that yaml file from go data types.

Like i said, we will work on an "init-upgrade" command which takes care of an auto update.

NOTE: The ocis.yaml is only interesting for the single process deployment. In kubernetes and the ocis chart we are not using ocis init at all. We just use the capabilities of the orchestration system to create unique IDs and secrets and manage them from there.

micbar commented 3 months ago

Closing - should work now with the suggested additions.