owncloud / data_exporter

Export/Import for ownCloud user data
GNU General Public License v2.0
7 stars 5 forks source link

improve export format #118

Open felixboehm opened 4 years ago

felixboehm commented 4 years ago

Milestone: Improvements on "export format" https://github.com/owncloud/data_exporter/milestone/2

Open issues

Spec

Simplified Structure

.
└── einstein
    ├── files
    │       ├── Documents
    │       │   └── Example.odt
    │       ├── Photos
    │       │   ├── Paris.jpg
    │       │   ├── San\ Francisco.jpg
    │       │   └── Squirrel.jpg
    │       └── ownCloud\ Manual.pdf
    ├── files_trashbin
    │       └── todo @butonic 
    ├── files_versions
    │       └── todo @butonic 
    ├── files.jsonl
    ├── shares.jsonl
    ├── user.json

After development

IljaN commented 4 years ago

Any pointers on how to model versions in the export in a platform independet manner? (oCis)@butonic @felixboehm

Possible avenues:

1. Add versions array to each entry in files.jsonl

This could potentially bloat files.jsonl, and this is only etag, owner and timestamp.

{
  "type": "file",
  "path": "/Rop/versioned.txt",
  "eTag": "3a1f4a6ab721bd13ae9abe79088d5a69",
  "permissions": 27,
  "mtime": 1573372163,
  "versions": {
    "1573372157": {
      "etag": "83cbf4a6423c1bf846650f50c987b135",
      "owner": "admin",
      "timestamp": 1573372157
    },
    "1573372158": {
      "etag": "befa0fe4cb4f672d9db9ca532059069d",
      "owner": "admin",
      "timestamp": 1573372158
    },
    "1573372159": {
      "etag": "f6c61a371083277bb3fe5583444da1f7",
      "owner": "admin",
      "timestamp": 1573372159
    },
    "1573372161": {
      "etag": "258067b818ff1633cec4fe6b244e4319",
      "owner": "admin",
      "timestamp": 1573372161
    },
    "1573372163": {
      "etag": "ba5d239ac8b84cb092ff5a0bd1ea9f3a",
      "owner": "admin", 
      "timestamp": 1573372163
    }
  }
}

Storage-Path and some other fields are ommited because I assume this an implementation detail. The target system should know by itself where to put it's versions. Please correct me If this assumption is wrong but following this train of tought we could simplify even further:

{
  "type": "file",
  "path": "/Rop/versioned.txt",
  "eTag": "3a1f4a6ab721bd13ae9abe79088d5a69",
  "permissions": 27,
  "mtime": 1573372163,
  "versions": [
      "1573372157"
      "1573372158"  
      "1573372159"  
      "1573372161"
   ]
}

As the files_version directory mirrors the user-dir but with all files suffixed with .v$VERSION the importer can recreate everything from the version-string by using the path of the file.

Downsides of the above approach are that the etag, mtime etc. for the version is lost (future clients with version-sync!). This also won't work for systems where versions are organized differently.

2. Don't link files to versions (like in ownCloud)

@butonic Will this be even possible in oCis i.e iterating over files-metadata and modifying retroactively? Only knowing the path and maybe the Storage?

Any toughts?

butonic commented 4 years ago

I'd go with a separate file. then an import does not need to read all version information and afaict we will need to bypass the cs3 api to add versions anyway. same for trashbin.