Open vvp opened 5 years ago
I'll kick off the discussion with a proposal that we use a monotonically increasing version
field inside of the entries themselves, the absence of which to be treated like the value 1
. The field's explicit value will start at 2.
This has the benefit of freeing the entry versions from having to be in lock-step with the package version, and gives us all the added benefit of being able to join logs of different versions. If possible, best to leave the old entries where they are instead of recreating/duplicating them in a migration.
Entries without identities: leave as public?
Lots of great thoughts here, thank you @vvp and @aphelionz!
monotonically increasing version field inside of the entries themselves
Agreed. We have this as v
field now as @vvp mentioned, which is set to 0 atm. For starters, we should increase the version number to 1 :)
Entries without identities: leave as public?
Old versions also have a signature, under .key
field in the data structure, which maps to identity.publicKey
in the new version.
Should there be support for multiple versions on code level, or require that older log versions need to be migrated to the single code-supported version first? Supporting multiple log/entry versions can make the development quite troublesome and error-prone, whereas requiring migrations will make the upgrading process more involved (especially with larger logs).
This is very true. I don't think we can "migrate" the logs in a way that the actual entries will be converted to the new structure due to the signatures in each entry. Which, I believe, leaves us with the second option of supporting multiple versions. However, as you say @vvp, this can make the code quite complex and highly error-prone, so it seems to me that the question is:
Do we want to or need to support multiple versions? If not, what are the consequences to users? If yes, what are the consequences to development and for maintainers (eg. do we commit to support all versions of logs from today all the way to the far future)?
Is there a way we could provide migration tools in a way that the end-user initiates and authorizes the migration (ie. they re-sign all converted/transformed entries) instead of developers building on/with orbitdb?
This is very true. I don't think we can "migrate" the logs in a way that the actual entries will be converted to the new structure due to the signatures in each entry. Which, I believe, leaves us with the second option of supporting multiple versions.
I've been thinking about the same thing but in https://github.com/peer-base/peer-base land and this is the way to go. There's always the latest canonical version of the data-structure and we must convert old versions to the canonical version when reading. This means we must tag those data structures with the versions and have code to migrate to the latest version incrementally.
Also, I would think that embracing conventional commits would improve the visibility of changes to developers. Many projects in the IPFS land already use them. You may check how to quickly setup the necessary tooling on some repos, for instance, this one. Basically:
npm run release
to use standard-release
so that it automatically bumps the version and updates the CHANGELOG.md based on the commitsI made a comment in @satazor 's PR that begins to address this: https://github.com/orbitdb/ipfs-log/pull/213/files#r244635479
Reading back through these comments, I believe we should increment the version number v
field from 0
to 1
as well.
I would like to make a more formal proposal based on the discussion we had on #213.
It's normal for the data-structures of ipfs-log
to evolve over time. This happened once when we introduced IPLD links support and it will eventually happen again in the future.
All the code that interacts with those data-structures should always assume that they are in the latest version. This makes it easy to reason about the code because there's only one shape of the data-structures: the most recent one. Instead of doing this in an ad-hoc manner, we should come up with a scheme that would allow us to transform those data-structures from older to newer versions and vice-versa. These are the scenarios to take into consideration:
Having that said, I propose to tag all the data-structures with a v
property that contains its version. We already have that setup for entries but not for logs.
Assuming that we now have a consistent way to identity the version of a data-structure, we may have a versioning pipeline based on the following scheme:
const schema = [
versions: [
{
version: 0,
up(data) {},
down(data) {},
codec: { name: 'dag-pb-v0' }
},
{
version: 1,
up(data) {},
down(data) {},
codec: { name: 'dag-cbor', ipldLinks: ['next'] }
},
// more in the future...
],
codecs: {
'dab-pb-v0': {
matches(cid, dagNode) {}
fromDagNode(dagNode) {},
toDagNode(data, ipldLinks) {}
},
'dag-cbor': {
matches(cid, dagNode) {}
fromDagNode(dagNode) {},
toDagNode(data, ipldLinks) {}
},
// more in the future...
},
]
...where:
scheme.versions[].version
: The version number of the version entryscheme.versions[].up
: A function that receives data
and transforms it to the next versionscheme.versions[].down
: A function that receives data
and transforms it to the previous versionscheme.codecs[].matches
: Returns true if dagNode
is of the given codec entryscheme.codecs[].fromDagNode
: Retrieves the underlying data
of the dagNode
scheme.codecs[].toDagNode
: Creates a dagNode
for the data
to be stored, converting any ipldLinks
to IPLD linksA versioning pipeline based on the schema, would have the following API:
verPipeline.read(scheme, dagNode): data
Reads the underlying data
of the dagNode
.
scheme.codecs[].matches
until one returns true
.
data
stored in the dagNode
by calling fromDagNode
on the codec entry that matched.data.v
from scheme.versions[]
.
codec.ipldLinks
of the version entryup
function, starting from data.v
up to the most recent onedata
with its original version in case by defining data.ov
as non-enumerable (ov stands for original version)verPipeline.write(scheme, data): dagNode
Creates a dagNode
for the data
, according to its version.
data.v
from scheme.versions[]
.
data.ov
from scheme.versions[]
.
down
function, starting from the version entry correspondent to data.v
down to data.ov
.scheme.codecs[]
that matches the codec.name
property of the version entry correspondent to data.ov
.toDagNode
from the codec entry with the correct ipldLinks
based on codec.ipldLinks
of the version entry correspondent to data.ov
.Changes on the public API are not as problematic as changes to the data-structures.
Having backwards-compatibility normally comes as at the cost of code-complexity. Having said that, choosing to have backwards compatibility is a per-situation decision.
Nevertheless, a breaking change should always translate to a new major version of the module. Moreover, all the changes (fixes, features, breaking changes) should be easily visible to users of ipfs-log
. This is usually made possible via a changelog which can be automated using the right tools. I propose the following:
Squash
button instead of the regular Merge
when merging a PR.standard-version
to create new releases. This tool will automatically bump the version of ipfs-log
based on the commits made since the last release (breaking changes: major, feat: minor, fix: patch) and generate the CHANGELOG.md
file for us automatically Let me know your thoughts!
As OrbitDB version 0.20.0 (orbitdb/orbit-db#524) is getting nearer and work on IPLD support (#200) has started, it would be a good time to discuss about the backward compatibility of the log. Currently there is not much:
v
-field but it cannot be used to differentiate between incompatible versions because it's currently always 0.For example, in next release there's a new
identity
field in entry-structures. Current version expects it to be there when entries are loaded from IPFS, and access-controller will actually fail if there's noidentity
information in entries to append. All the log entries created with previous versions will not have this information. Fortunately, this check is done only on new items appended/joined into log, so appending new entries to old logs will still work after version upgrade.Some design aspects that I see:
LogIO.fromMultihash()
), whereas versioning the entries too would allow joining logs with different versions together and be more flexible in backward compatibility. Single-version log would probably need to have an internal version-specificlogId
which would then have consequences on entries' log references too.Any thoughts, opinions? :slightly_smiling_face: