orbitdb-archive / ipfs-log

Append-only log CRDT on IPFS
https://orbitdb.github.io/ipfs-log/
MIT License
398 stars 55 forks source link

Backward compatibility of the Log #211

Open vvp opened 5 years ago

vvp commented 5 years ago

As OrbitDB version 0.20.0 (orbitdb/orbit-db#524) is getting nearer and work on IPLD support (#200) has started, it would be a good time to discuss about the backward compatibility of the log. Currently there is not much:

For example, in next release there's a new identity field in entry-structures. Current version expects it to be there when entries are loaded from IPFS, and access-controller will actually fail if there's no identity information in entries to append. All the log entries created with previous versions will not have this information. Fortunately, this check is done only on new items appended/joined into log, so appending new entries to old logs will still work after version upgrade.

Some design aspects that I see:

Any thoughts, opinions? :slightly_smiling_face:

aphelionz commented 5 years ago

I'll kick off the discussion with a proposal that we use a monotonically increasing version field inside of the entries themselves, the absence of which to be treated like the value 1. The field's explicit value will start at 2.

This has the benefit of freeing the entry versions from having to be in lock-step with the package version, and gives us all the added benefit of being able to join logs of different versions. If possible, best to leave the old entries where they are instead of recreating/duplicating them in a migration.

Entries without identities: leave as public?

haadcode commented 5 years ago

Lots of great thoughts here, thank you @vvp and @aphelionz!

monotonically increasing version field inside of the entries themselves

Agreed. We have this as v field now as @vvp mentioned, which is set to 0 atm. For starters, we should increase the version number to 1 :)

Entries without identities: leave as public?

Old versions also have a signature, under .key field in the data structure, which maps to identity.publicKey in the new version.

Should there be support for multiple versions on code level, or require that older log versions need to be migrated to the single code-supported version first? Supporting multiple log/entry versions can make the development quite troublesome and error-prone, whereas requiring migrations will make the upgrading process more involved (especially with larger logs).

This is very true. I don't think we can "migrate" the logs in a way that the actual entries will be converted to the new structure due to the signatures in each entry. Which, I believe, leaves us with the second option of supporting multiple versions. However, as you say @vvp, this can make the code quite complex and highly error-prone, so it seems to me that the question is:

Do we want to or need to support multiple versions? If not, what are the consequences to users? If yes, what are the consequences to development and for maintainers (eg. do we commit to support all versions of logs from today all the way to the far future)?

Is there a way we could provide migration tools in a way that the end-user initiates and authorizes the migration (ie. they re-sign all converted/transformed entries) instead of developers building on/with orbitdb?

satazor commented 5 years ago

This is very true. I don't think we can "migrate" the logs in a way that the actual entries will be converted to the new structure due to the signatures in each entry. Which, I believe, leaves us with the second option of supporting multiple versions.

I've been thinking about the same thing but in https://github.com/peer-base/peer-base land and this is the way to go. There's always the latest canonical version of the data-structure and we must convert old versions to the canonical version when reading. This means we must tag those data structures with the versions and have code to migrate to the latest version incrementally.


Also, I would think that embracing conventional commits would improve the visibility of changes to developers. Many projects in the IPFS land already use them. You may check how to quickly setup the necessary tooling on some repos, for instance, this one. Basically:

aphelionz commented 5 years ago

I made a comment in @satazor 's PR that begins to address this: https://github.com/orbitdb/ipfs-log/pull/213/files#r244635479

Reading back through these comments, I believe we should increment the version number v field from 0 to 1 as well.

satazor commented 5 years ago

I would like to make a more formal proposal based on the discussion we had on #213.

Data-structures

It's normal for the data-structures of ipfs-log to evolve over time. This happened once when we introduced IPLD links support and it will eventually happen again in the future.

All the code that interacts with those data-structures should always assume that they are in the latest version. This makes it easy to reason about the code because there's only one shape of the data-structures: the most recent one. Instead of doing this in an ad-hoc manner, we should come up with a scheme that would allow us to transform those data-structures from older to newer versions and vice-versa. These are the scenarios to take into consideration:

Having that said, I propose to tag all the data-structures with a v property that contains its version. We already have that setup for entries but not for logs. Assuming that we now have a consistent way to identity the version of a data-structure, we may have a versioning pipeline based on the following scheme:

const schema = [
  versions: [
    {
      version: 0,
      up(data) {},
      down(data) {},
      codec: { name: 'dag-pb-v0' }
    },
    {
      version: 1,
      up(data) {},
      down(data) {},
      codec: { name: 'dag-cbor', ipldLinks: ['next'] }
    },
    // more in the future...
  ],
  codecs: {
    'dab-pb-v0': {
      matches(cid, dagNode) {}
      fromDagNode(dagNode) {},
      toDagNode(data, ipldLinks) {}
    },
    'dag-cbor': {
      matches(cid, dagNode) {}
      fromDagNode(dagNode) {},
      toDagNode(data, ipldLinks) {}
    },
    // more in the future...
  },
]

...where:

A versioning pipeline based on the schema, would have the following API:

verPipeline.read(scheme, dagNode): data

Reads the underlying data of the dagNode.

verPipeline.write(scheme, data): dagNode

Creates a dagNode for the data, according to its version.

Public API

Changes on the public API are not as problematic as changes to the data-structures.

Having backwards-compatibility normally comes as at the cost of code-complexity. Having said that, choosing to have backwards compatibility is a per-situation decision.

Nevertheless, a breaking change should always translate to a new major version of the module. Moreover, all the changes (fixes, features, breaking changes) should be easily visible to users of ipfs-log. This is usually made possible via a changelog which can be automated using the right tools. I propose the following:

Let me know your thoughts!