[Spaces] [Desktop] OCIS Spaces Migration API

labkode commented 2 years ago

This issue describes a possible API to be implemented in the server that the sync client can use to translate existing sync folder pairs to spaces endpoints.

Follow-up of #3528

How to trigger space migration?

The sync client will read the usual capabilities endpoint:

GET /ocs/v1.php/cloud/capabilities?format=json

{
   "ocs" : {
      "data" : {
         "capabilities" : {
            "migration": {
                "space_migration": {
                    "enabled": true,
                    "endpoint": "/space-migration"
            },
            ....
   },
   "meta": {
       "itemsperpage": "",
       "message": "OK",
       "status": "ok",
       "statuscode": 200,
       "totalitems": ""
   }
}

If the migration>space_migration>enabled equals true then the following logic is performed.

Configuration

Let's take the following Mac OS Desktop Sync client configuration as an example:

[General]
clientVersion=2.5.4 (build 2623)
optionalDesktopNotifications=true

[Accounts]
0\Folders\1\ignoreHiddenFiles=true
0\Folders\1\journalPath=._sync_cc49352a214a.db
0\Folders\1\localPath=/Users/gonzalhu/CERNBox/
0\Folders\1\paused=false
0\Folders\1\targetPath=/home
0\Folders\1\usePlaceholders=false
0\Folders\1\version=1
0\Folders\2\ignoreHiddenFiles=true
0\Folders\2\journalPath=._sync_cc49352a214a.db
0\Folders\2\localPath=/Users/gonzalhu/test_Desktop
0\Folders\2\paused=false
0\Folders\2\targetPath=/eos/project/c/cernbox/
0\Folders\2\usePlaceholders=false
0\Folders\2\version=1
0\General\CaCertificates="@ByteArray(-----BEGIN CERTIFICATE----------END CERTIFICATE-----\n\n)"
0\authType=http
0\dav_user=gonzalhu
0\http_oauth=false
0\http_user=gonzalhu
0\serverVersion=10.1.1.1-
0\url=https://cernbox.cern.ch/cernbox/desktop
0\user=gonzalhu
0\version=1
version=2

[ActivityErrorListHeader]
geometry=@ByteArray(\0\0\0\xff\0\0\0\0\0\0\0\x1\0\0\0\x1\0\0\0\0\x1\0\0\0\0\0\0\0\0\0\0\0\x4\x4\0\0\0\x1\0\0\0\x2\0\0\0\x64\0\0\x2\xd8\0\0\0\x4\x1\x1\0\x1\0\0\0\0\0\0\0\0\0\0\0\0\x64\xff\xff\xff\xff\0\0\0\x81\0\0\0\0\0\0\0\x4\0\0\0\x96\0\0\0\x1\0\0\0\0\0\0\0\xb4\0\0\0\x1\0\0\0\0\0\0\0\0\0\0\0\x1\0\0\0\0\0\0\x1\x8e\0\0\0\x1\0\0\0\0\0\0\x3\xe8\0\0\0\0\x64)

[ActivityListHeader]
geometry=@ByteArray(xxx\x64)

[Proxy]
host=0.0.0.0
needsAuth=false
pass=@ByteArray()
port=8080
type=0
user=

[SettingsMac]
geometry=@ByteArray(xxxx0)

The sync client needs to extract the relevant information in a parseable common format, I use JSON as it is wide spreaded. I suggest the sync clients send ALL the information available but redacts or omits secrets. The approach of sending all the configuration information is a safeguard to prevent the case where we miss some field and then we need another version of the sync client to handle it (and another round of desktop sync client updates).

Payload Request (PAYREQ)

This is an example of the desktop client whose config has not been migrated, i.e all the sync folder pairs are on the old format.

{
    "General": {
        "clientVersion": "2.5.4 (build 2623)",
        "optionalDesktopNotifications": true
    },
    "AccountsInfo": {
        "version": 2,
        "Accounts": {
            "0": {
                "authType": "http",
                "dav_user": "gonzalhu",
                "http_oauth": false,
                "http_user": "gonzalhu",
                "serverVersion": "10.1.1.1-",
                "url": "https://cernbox.cern.ch/cernbox/desktop",
                "user": "gonzalhu",
                "version=1",
                "General": {
                    "CaCertificates": "xxxx"
                },
                "Folders": {
                    "1": {
                        "ignoreHiddenFiles": true,
                        "journalPath": "._sync_cc49352a214a.db",
                        "localPath": "/Users/gonzalhu/CERNBox/",
                        "paused": false,
                        "targetPath": "/home",
                        "usePlaceholders": false,
                        "version": 1
                    },
                    "2": {
                        "ignoreHiddenFiles": true,
                        "journalPath": "._sync_cc49352a214a.db",
                        "localPath": "/Users/gonzalhu/test_Desktop/",
                        "paused": false,
                        "targetPath": "/eos/project/c/cernbox",
                        "usePlaceholders": false,
                        "version": 1
                    }
                }
        }
    },
}

Payload Response (PAYRES)

{
    "General": {
        "clientVersion": "2.5.4 (build 2623)",
        "optionalDesktopNotifications": true
    },
    "AccountsInfo": {
        "version": 2,
        "Accounts": {
            "0": {
                "authType": "http",
                "dav_user": "gonzalhu",
                "http_oauth": false,
                "http_user": "gonzalhu",
                "serverVersion": "10.1.1.1-",
                "url": "https://cernbox.cern.ch/cernbox/desktop",
                "user": "gonzalhu",
                "version=1",
                "General": {
                    "CaCertificates": "xxxx"
                },
                "Folders": {
                    "1": {
                        "ignoreHiddenFiles": true,
                        "journalPath": "._sync_cc49352a214a.db",
                        "localPath": "/Users/gonzalhu/CERNBox/",
                        "paused": false,
                        "targetPath": "/home",
                        "usePlaceholders": false,
                        "version": 1,
                        "space": {
                            "endpoint": "/dav/spaces/xyz"
                            "id": "abc",
                            "any_other_needed_attribute": "ghi"
                        }
                    },
                    "2": {
                        "ignoreHiddenFiles": true,
                        "journalPath": "._sync_cc49352a214a.db",
                        "localPath": "/Users/gonzalhu/test_Desktop/",
                        "paused": false,
                        "targetPath": "/eos/project/c/cernbox",
                        "usePlaceholders": false,
                        "version": 1,
                        "space": {
                            "endpoint": "/dav/spaces/xyz"
                            "id": "abc",
                            "any_other_needed_attribute": "ghi"
                        }
                    }
                }
        }
    },
}

API

204: account not enabled to be migrated, nothing to do 200: configuration for account already migrated, nothing to do. 201: client applies configuration to migrate to spaces

Request

curl -X POST remote.server/space-migration --data-binary @/tmp/request-payload.json

POST /space-migration
Content-Type: 'application/json'

SEE PAYREQ

201
Content-Type: 'application/json'

SEE PAYRES

The workflow will be like this for old and new clients to make the migration not breaking existing clients.

Old clients that do not know about new capability

sequenceDiagram
    participant O as Old Client
    participant S as Server

    O->>S: GET capabilities
    S->>O: migration>space_migration>enabled = true
    O->>O: I do nothing as I don't know about this new capability

New clients that now to handle the new capability

sequenceDiagram
    participant N as New Client
    participant S as Server

    N->>S: GET capabilities
    S->>N: migration>space_migration>enabled = true
    N->>N: I know about this capability, so I will trigger the space migration procedure

    N->>S: POST /space-migration?account=gonzalhu
    S->>N: 204: user is not enabled to be migrated

    S->>S: user gonzalhu is enabled to be migrated to spaces

     N->>S: POST /space-migration?account=gonzalhu

    S->>N: 201: new config is returned to client

    N->>N: OK, I got a new config from the server that I need to apply<br/>. So I create a local copy of the config and apply the suggested changes<br/> from the server. Once I'm done, I verify the new config with the server.

    N->>S: POST /space-migration?account=gonzalhu&verify=tue

    S->>N: 200: the config received is the one expected containing the migrated paths

    N->>N: Okay, so now that I have a good config I want to be sure that <br/> the new space endpoints are reachable so I don't leave the user in a broken state.

    N->>S: PROPFIND /dav/spaces/xyz
    S->>N: 207 (anything other error code on the 4xx/5xx aborts the migration)

    N->>N: I verified that all endpoints are reachable, <br/> so I backup the old configuration and commit the new one.
    N->>S: POST /space-migration?account=gonzalhu

    S->>N: 200: config migrated

    Note right of N: time passes and sync client continously pings the server <br/> until capability is disabled once the migration is done for all clients.

     N->>S: POST /space-migration?account=gonzalhu

    S->>N: 200: config migrated

FAQ

Why 3 HTTP status codes (200, 201 and 204)? When you are an operator you want to take a quick look at the webserver logs and quickly identify the request and behaviour expressed by the server.

The logs from a user account gonzalhu will look like this:

100.100.100.100 - - [25/Apr/2022:15:55:56 +0200] "POST /space-migration?username=gonzalhu HTTP/1.1" 204 0 "-" 
100.100.100.100 - - [25/Apr/2022:15:55:56 +0200] "POST /space-migration?username=gonzalhu HTTP/1.1" 201 3457 "-" 
100.100.100.100 - - [25/Apr/2022:15:55:56 +0200] "POST /space-migration?username=gonzalhu&verify=true HTTP/1.1" 200 3457 
100.100.100.100 - - [25/Apr/2022:15:55:56 +0200] "POST /space-migration?username=gonzalhu HTTP/1.1" 200 3457 "-" "

Why sending the account name as query parameter?
1. To identify the person concerned from the weblogs
2. In case of multi-account setups to tackle the migration user by user, simplifying error handling.
Why sending the verify parameter? The verify is sent as a way to perform a double-commit on the sync client and to differentiate from the 200 response without the verify. It also helps the operator to understand what is going on. For example, if the verify is not seen that means that the sync client crashed/quit and the migration couldn't be completed in a safe way. The last thing we want is to leave a sync client broken and having to perform manual investigations on the user computer to fix it.

labkode commented 2 years ago

@dragotin @michaelstingl @TheOneRing I think this is pretty much it, please let me know any comments. Note that we need to define how the spaces will be reflected in the client configuration, I imagine that there will be more details to store rather than just a sync folder pair (like space id, space type, etc .. ) and that needs to be added to the payload request (PAYREQ).

TheOneRing commented 2 years ago

I'm against jsonifying the local settings also the local path could contain sensitive information the user does not want to share with a server and I don't think any decisions should be made based on the local path. Same for the other settings, why are those needed in addition to the targetPath?

labkode commented 2 years ago

@TheOneRing can you propose a request payload and expected response with the required attributes that you think will be needed? Thanks!

TheOneRing commented 2 years ago

I think all we need is something like this: Request: POST /space-migration?username=gonzalhu

Body:

{
  "version": "2.11",
  "folders": [
     "/",
    "/Documents",
    "/eos/a/Alice",
    "/Shares/"
  ]
}

Response: 400

{
  "error": "Migration failed due to"
}

Response: 200

{
  "folders": {
    "/": [
      {
        "space_id": "PERSONAL SPACE ID OF gonzalhu",
        "path": "/"
      }
    ],
    "/Documents": [
      {
        "space_id": "PERSONAL SPACE ID OF gonzalhu",
        "path": "/Documents"
      }
    ],
    "/eos/a/Alice": [
      {
        "space_id": "PERSONAL SPACE ID OF Alice",
        "path": "/"
      }
    ],
    "/Share": [
      {
        "error": "Shares can't be migrated"
      }
    ]
  }
}

The client will then map the folder sync pairs to the spaces using the space id and the new relative paths.

felix-schwarz commented 2 years ago

@labkode @TheOneRing

why does the client need to send its configuration?
is migration a one-way step performed on-demand, per account (or even per client)?

@TheOneRing asked me for feedback, so, with the iOS client's internals in mind, here's - in broad strokes - what I think should work to convert an OC10 (spaces-incapable) account to a spaces-backed account:

1) new spaces-migration-status capability, returning migration state:

unavailable: the account can't be migrated because it is already spaces-backed
forbidden: the account may not be migrated
possible: the account can be migrated to become spaces-backed
in-progress: the migration of the account is in progress
completed: the account has been migrated to spaces

2) new migration endpoint to:

initiate the migration, f.ex. via /migration/spaces/initiate-migration
(optional) retrieve the current migration status described above, f.ex via /migration/spaces/status. (in that case, capabilities would not need to be changed/extended - and the existing capability indicating drive support would be the signal for clients to check the migration status for accounts that weren't drives-enabled before)
retrieve a migration map, f.ex. via /migration/spaces/map that it can use to restructure/migrate its local data. The map would map the legacy path to drive-id + path (essentially pretty much what @TheOneRing already suggested, minus the error message):

{
  "folders": {
    "/": [
      {
        "space_id": "PERSONAL SPACE ID OF gonzalhu",
        "path": "/"
      }
    ],
    "/Documents": [
      {
        "space_id": "PERSONAL SPACE ID OF gonzalhu",
        "path": "/Documents"
      }
    ],
    "/eos/a/Alice": [
      {
        "space_id": "PERSONAL SPACE ID OF Alice",
        "path": "/"
      },
      "/vanished/share" : [
          "error" : "This share could not be migrated and has been removed.",
          "removed" : true
      ]
    ]
  },
}

I omitted the error message when replicating @TheOneRing's example because the endpoint would not take any parameters or configuration. Instead, it would only return the map to use when mapping legacy paths to their migrated driveID + path pairs.

And each client would then leverage that map to translate/migrate its data set and settings.

Regarding shares: the map would also include the legacy root paths of all shares, mapped to the drive ID + path pairs they have been migrated to. Where shares can't be migrated (and would need to be removed), the share's respective root path would appear in the map with an error and an indication that it has been removed.

labkode commented 2 years ago

@felix-schwarz:

We discussed that that the sync client is as dummy as possible and does not perform any complex logic, therefore, we push it to the server and hence we need this API. The sync client needs to send its configuration for the server to understand the relationship between paths and spaces.

We want migration to be progressive, per user, per account. There isn't another way for a production migration to make it transparent. We only allow our users to have one account configured at a time, so migration could be even per user, that depends on your requirements.

Our sync clients query the following endpoint: /cernbox/desktop/ocs/v1.php/cloud/capabilities?format=json

This endpoint is public and does not contain any user-specific behaviour, so I'm against adding capabilities based on username here. The previous approach is needed: using a static non-user dependant capability to trigger the migration logic on the client.

So, I'm pretty much in favour on taking @TheOneRing proposal to make it simply and just having this static capability.

felix-schwarz commented 2 years ago

@labkode Thanks for sharing the context and thoughts behind this.

There are still a few things that aren't clear to me, however:

1) The iOS client builds and maintains a database of the whole account, not just specific shares or folders. Assuming it would only send / as folder path then, would it only get back info on the user's personal space in return - or also for shares located below it?

I.e. would it have to also identify all share roots in the account's folder tree and also send those shares along to get info on them?

2) How does migration work for the 2nd, 3rd, etc. client of the same user? Especially if the client software has a different configuration / structure?

3) If a client sends its configuration to the server to get a mapping table from old path to new path + drive ID back, what's the benefit (or technical/server-side requirement/background) that makes this preferable to the server returning the full mapping table for the account - and the client simply picking from it what applies to it?

TheOneRing commented 2 years ago

@felix-schwarz:

We discussed that that the sync client is as dummy as possible and does not perform any complex logic, therefore, we push it to the server and hence we need this API. The sync client needs to send its configuration for the server to understand the relationship between paths and spaces.

We want migration to be progressive, per user, per account. There isn't another way for a production migration to make it transparent. We only allow our users to have one account configured at a time, so migration could be even per user, that depends on your requirements.

Our sync clients query the following endpoint: /cernbox/desktop/ocs/v1.php/cloud/capabilities?format=json

This endpoint is public and does not contain any user-specific behaviour, so I'm against adding capabilities based on username here. The previous approach is needed: using a static non-user dependant capability to trigger the migration logic on the client.

So, I'm pretty much in favour on taking @TheOneRing proposal to make it simply and just having this static capability.

What additional information besides the user name and the old dav url are needed? Why do you need the trusted certificats, whether vfs is used or the window geometry on the server?

I've already spend days (felt like years) trying to figure out how customers managed to break the owncloud.cfg by applying clever deployment tricks. The owncloud.cfg is not to be touched by any external process.

labkode commented 2 years ago

@felix-schwarz

The iOS client builds and maintains a database of the whole account, not just specific shares or folders. Assuming it would only send / as folder path then, would it only get back info on the user's personal space in return - or also for shares located below it?

Shares will be exposed under a new endpoint outside of the current personal folder, i.e they won't be mounted inside a personal home space anymore. The current remote path for default installations is / and that will map to a personal space. Once the migration happens for the sync client for remote folders, the sync client can then query the space discovery endpoint to discover other spaces that were not available, like shares.

How does migration work for the 2nd, 3rd, etc. client of the same user? Especially if the client software has a different configuration / structure?

Right. We cannot have a state on the server per user, and having the state per-user-client is even more difficult. I think we need to assume that when the static capability is enabled all the clients will try to perform a migration if their local state has not been yet migrated. Once the sysadmin decides that the migration is over, the capability is retired and sync clients will not need to perform the migration logic anymore. Clients that missed the update will simply stop working (sysadmin will disable old webdav paths for example).

If a client sends its configuration to the server to get a mapping table from old path to new path + drive ID back, what's the benefit (or technical/server-side requirement/background) that makes this preferable to the server returning the full mapping table for the account - and the client simply picking from it what applies to it?

Because the server does not know what remote folders the user is querying. In our deployment, a user will connect usually to a remote named /home, but can also connect to a remote named /home/MySubFolder. The server cannot simply create a map of arbitrary remote sync folder pairs. However, the server can understand the remote folder configured in the client and return the appropriate space id.

labkode commented 2 years ago

@TheOneRing @felix-schwarz any news on this?

dragotin commented 2 years ago

Ok, let me try to summarize this, and make it actionable:

Reasoning

The client sends parts of the old configuration to an endpoint on the server side to get knowledge about the space ID and a path component if applicable. If that call succeeds, the client will be able to compute if it can re-use already synced folders by looking up the content of the me/drives/ endpoint and compare the space ID and path.

For the site administrators it is a way to be in control how many migrations happen and if they happened.

The migration step is a one time activity. If the client has once successfully received the information, it does not try to call the migration endpoint again.

Flow

Capability

There is a capability if the migration endpoint should called at all. Capabilities are not user specific, so this is the general switch for all users.

Send Configuration

The client sends a json document to a specific migration endpoint /migration/spaces of the following format:

{
  "version": "3.0.0",
  "remotefolders": [
     "/",
    "/Documents",
    "/eos/a/Alice",
    "/Shares/"
  ]
}

Response: 200

{
  "folders": {
    "/": [
      {
        "space_id": "PERSONAL SPACE ID OF gonzalhu",
        "path": "/"
      }
    ],
    "/Documents": [
      {
        "space_id": "PERSONAL SPACE ID OF gonzalhu",
        "path": "/Documents"
      }
    ],
    "/eos/a/Alice": [
      {
        "space_id": "PERSONAL SPACE ID OF Alice",
        "path": "/"
      }
    ],
    "/Share": [
      {
        "space_id": "VIRTUAL SHARE SPACE ID of Alice",
        "path": "/"
      }
    ]
  }
}

In case the client should not yet be migrated, the server responds with 204 (No Content). In that case, the client continues to use the existing configuration.

Note: The local client paths are useless for the migration routine on the server because local paths are completely under the control of the different clients. If one user has two desktop clients f. ex. the local paths can be different on each. The migration needs to work for both, however.

Questions

I do not see the benefit of the initiate-migration step @felix-schwarz described and left that intentionally out for simplicity. Ok?

dragotin commented 2 years ago

About Shares:

I'd suggest to not migrate the Shares at all to reduce complexity in the client implementations.

If a legacy user syncs the entire cloud with only one sync connection, the flow would look like:

{
  "version": "3.0.0",
  "remotefolders": [
     "/",
  ]
}

Response: 200

{
  "folders": {
    "/": [
      {
        "space_id": "PERSONAL SPACE ID OF gonzalhu",
        "path": "/"
      }
    ],
  }
}

and the sync client would only migrate the Personal space. In the first sync run, the /Shares directory would be removed on client side. The user would be forced to re-sync the shares using the Add-Shares Wizard and place the shares and spaces as desired.

@labkode How would that work with the CERN projects that you have in the legacy system?

labkode commented 2 years ago

@dragotin that is what Hannah proposed and I think it can work. The only part that is missing is that we won't enable this migration for all users at the same time for obvious reasons, so we need to keep control on when the sync client triggers the migration. For that I proposed to have a different status code

204: account not enabled to be migrated, nothing to do

dragotin commented 2 years ago

Agreed, that is what @TheOneRing suggested, I just wanted to summarize the facts again so that we're all on the same page. I added your 204 suggestion to my summary above, thanks.

labkode commented 2 years ago

@dragotin any update on the implementation?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

dragotin commented 2 years ago

@fmoc @TheOneRing can you give an update?

michaelstingl commented 2 years ago

@fmoc @TheOneRing can you give an update?

Can be tested with the 3.0-pre-release builds. A branded build was sent to @labkode

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

owncloud / ocis