Open Gozala opened 4 years ago
I’m not sure what this accomplishes.
Is this meant to convert between the new and old CID
implementations?
If so, this isn’t going to work as stated because what is causing the breaking change is the fact that the new implementation doesn’t have the full codec table, so you won’t have the .codec
property on the new instances and you won’t be able to instantiate with a string codec name, you have to use the integer instead.
You should be able to convert from the new CID
to the old CID
instance though.
However, I don’t think it’s worth writing this particular compatibility layer because there’s other changes across the new stuff as well. This is all covered by the legacy()
method provided by js-multiformats
, it will take a new style codec and give you back an old style codec complete with conversions to the old CID
class and Uint8Array
conversions back to Buffer
which the old stuff also relies on.
Having code that sits in-between this migration is probably going to be more pain than it’s worth, there’s a lot of small differences covered by legacy
you’ll want to deal with, and you should also consider how dependent existing consumers are on Buffer
.
This isn’t just a breaking change across our code bases, it’s also a breaking change across all of our consumers. We have to consider what happens to third party consumers when we swap these type implementations out and ensure that we either maintain compatibility or cause an exception because there are many cases where you could not cause an exception but still cause incorrect behavior that consumers won’t even notice and instead be writing bad data. This is why we decided to make the new CID
instance pass isCID
but throw on property access of .codec
.
I’m not sure what this accomplishes.
Is this meant to convert between the new and old CID implementations?
If so, this isn’t going to work as stated because what is causing the breaking change is the fact that the new implementation doesn’t have the full codec table, so you won’t have the .codec property on the new instances and you won’t be able to instantiate with a string codec name, you have to use the integer instead.
It may not cover this specific change because as you said it may not be possible to convert without codec table. However it can cover other changes that may occur. Here is the contrived example:
class Bla {
static get typeSymbol() {
return Symbol.for('Bla')
}
static isBla(v) {
return v && v[Bla.typeSymbol]
}
static asBla(v) {
if (v instanceof v) {
return v
} else if (v && Buffer.isBuffer(v.buffer)) {
return new Bla(v.buffer)
} else {
return null
}
}
constructor(buffer) {
if (!Buffer.isBuffer(buffer)) {
throw new Error("Argument should be a buffer")
}
this.buffer = buffer
}
toHexString() {
return this.buffer.toString('hex')
}
}
class Bla {
static get typeSymbol() {
return Symbol.for('Bla')
}
static isBla(v) {
return v && v[Bla.typeSymbol]
}
static asBla(v) {
if (v instanceof v) {
return v
} else if (v && Buffer.isBuffer(v.buffer)) {
return new Bla(v.buffer)
} else {
return null
}
}
constructor(buffer) {
if (!Buffer.isBuffer(buffer)) {
throw new Error("Argument should be a buffer")
}
this.buffer = buffer
}
toHexString() {
return this.buffer.toString('hex')
}
toBase64String() {
return return this.buffer.toString('base64')
}
}
if (Bla.isBla(v)) {
v.toBase64String()
}
If v
is V1 it will fail on the other hand following will work as expected
const bla = Bla.asBla(v)
if (bla) {
bla.toBase64String()
}
Edit: Added constructors to the examples for the completeness
@mikeal I think you were reading this in context of new CIDs work happening in IPLD, which is likely due to me mentioning comment about toString
method changes.
I think reading with that context is misleading. It is unrelated. There is problem with isCID
pattern that asCID
pattern would handle. Proposed pattern would also work across realms which class-is aspired to do, but fails as I'm finding in my work.
I like this pattern and I think it elegantly solves most of the challenges here and the instanceof
shortcut is really nice for speeding this up in the common case. It seems to me that you just need to come up with a satisfactory solution to the faux-CID checkCIDComponents()
problem that's being discussed in #109 (a circular property would probably work but that's quite a hack!).
I think this method might be able to solve the old-CID new-CID conversion problem too, both ways: for new to old it just has to do a lookup of code
to codec
in the base table it already has, but for old to new, the CID
constructor happens (for some reason?) to have the base table attached as codecs
, so it's just a matter of doing code = cid.constructor.codecs[cid.codec]
(where cid
is an instantiated CID
).
We do have cases in IPLD that rely on an isCID()
to properly encode data and I can imagine those becoming a performance concern. e.g. in dag-cbor we have to do a deep walk of every object that gets encoded and replace CIDs with tagged CBOR types so that borc properly encodes them as TAG=42 elements. Having to use an asCID()
would mean we're creating a ton of null
assignments for most object properties and doing some amount of internal property inspection for a checkCIDComponents()
. For isCID()
we're only doing a quick boolean check on a Symbol attached to each property. These may be costs that a smart VM could deal with, but it might be useful to figure that out ahead of time.
https://github.com/ipld/js-ipld-dag-cbor/blob/master/src/util.js#L46-L48
Similar in DAG-JSON:
https://github.com/ipld/js-dag-json/blob/master/index.js#L8-L10
As I see it, the problem is you get a CID
back from another module and you're not sure if it supports the API you expect. Historically a test would pick this up and we would solve the problem by PRing the other module to upgrade its CID dependency to the latest version, and so the new version would propagate through the ecosystem.
If this is not an option then a code change may be appropriate.
As proposed it looks like asCID
is doing a bunch of the checks that the CID
class does in its constructor.
Why not just add any additional checks there instead of having to add conditional logic to the calling code every time we touch a CID from another module?
As proposed it looks like asCID is doing a bunch of the checks that the CID class does in its constructor.
It does same exact check as isCID
as it’s primary goal is to replace it. There’s some overlap with what CID
constructor does, but that’s just coincidence.
Why not just add any additional checks there instead of having to add conditional logic to the calling code every time we touch a CID from another module?
It really is poor mans pattern matching rather then constructor. Meaning it may return CID if value passed represents CID otherwise it returns null
. Which typically constructors can’t do & I’m not sure overloading constructor in this was would be a good idea.
Argument could be made that constructor could throw instead of returning null
, but that would have negative impact on usability and performance.
Having to use an asCID() would mean we're creating a ton of null assignments for most object properties and doing some amount of internal property inspection for a checkCIDComponents(). For isCID() we're only doing a quick boolean check on a Symbol attached to each property.
Not to object properties, to local variables that and if not null only then assign. This really meant to do exact same check except null check instead of boolean.
I do mention checkCIDComponents
, but it really should do what isCID
does (some property chech)
Argument could be made that constructor could throw instead of returning null
Throwing when it encounters invalid data is exactly what the constructor does right now.
It does same exact check as isCID as it’s primary goal is to replace it.
If that's so, then just replace it. Don't give the user multiple ways of doing the same thing.
However I'd rather not add another method that does 50% of isCID's job and 50% of the constructors job.
If anything, I'd like to see v1 of this module released so we can properly communicate breaking changes through semver (e.g. not lose instanceof
compatibility because a new method was added and we had to bump the 0.x.0 minor and we have a bunch of old modules in our deps that haven't had PRs submitted to them to upgrade the cids
dep), class-is
removed and use of instanceof
instead of isCID
.
Throwing when it encounters invalid data is exactly what the constructor does right now.
Ok but here is the prominent pattern I see today within PL code base and outside of it:
if (CID.isCID(value)) {
useCID(value)
}
// ...etc
Which following this proposal would turn into:
const cid = CID.asCID(value)
if (cid) {
useCID(cid)
}
// ...etc
Embracing constructor here would change that into (what I'd say an awkward) usage:
try {
const cid = new CID(value)
useCID(cid)
} catch(_) {}
// ...etc
Which also introduces a surface for bugs where exception thrown by useCID
can get swallowed unintentionally.
If that's so, then just replace it. Don't give the user multiple ways of doing the same thing.
However I'd rather not add another method that does 50% of isCID's job and 50% of the constructors job.
I agree I think long term goal is to replace isCID
with asCID
pattern, however I think there is a benefit to doing a gradual transition instead of a sweep requiring tight coordination. If replacing seems like a better option I'm down with that as well.
Had a conversation with @mikeal today about this & I would like to provide a sketch as what I think we agreed was a worthy avenue to explore.
class CID {
/**
* @param {any} value
* @returns {CID|null}
*/
static asCID(value) {
// If it is already a CID instance no allocation is necessary
// just return it back
if (value instanceof CID) {
return value
}
// If it is not instance of CID it might be instance from different version of CID
// library or it could be CID instance that was structure cloned. Instead of using
// symbol we recognize `value.asCID` circular that renders `value` invalid for
// JSON & DAG representation. In other words it's equivalent of current symbol
// check.
else if (value != null && value.asCID === value) {
const {version, codec, multihash, multibaseName} = value
return new CID(version, codec, multihash, multibaseName)
} else {
return null
}
}
constructor(version, codec, multihash, multibaseName) {
this.version = version
this.codec = codec
this.multihash = multihash
this.multibaseName = multibaseName
this.asCID = this
}
toJSON() {
const { version, codec, multihash, multibaseName } = this
return { version, codec, multihash, multibaseName }
}
}
What this gives us is following:
const main = (multihash) => {
var cid = new CID(0, 'dag-pb', multihash, 'base32')
var {port1:sender, port2:receiver} = new MessageChannel()
receiver.onmessage = ({data}) => {
data instanceof CID // false
const cid2 = CID.asCID(data)
cid2 instanceof CID // true
}
sender.postMessage(cid)
}
Given our conversation I've gathered this was nice to have.
var cid = new CID(0, 'dag-pb', multihash, 'base32')
JSON.stringify(cid) // => {"version":0,"codec":"dag-pb", "multibaseName":"base32","multihash":{"0":190,"1":203,....}}
CID.isCID
can be kept for gradual transitionThis assumes that CID.isCID
is unchanged (uses the symbol check as it does today)
const main = (multihash) => {
var cid = new CID(0, 'dag-pb', multihash, 'base32')
var {port1:sender, port2:receiver} = new MessageChannel()
receiver.onmessage = ({data}) => {
CID.isCID(data) // => false
const cid2 = CID.asCID(data)
cid2 instanceof CID // => true
}
sender.postMessage(cid)
}
Meaning that CID.isCID
will continue to work just as it does today with-in the single thread boundary. However CID instance passed across thread boundaries will fail CID.isCID(v)
check (just as today). On the other hand if CID.asCID(v)
is used on the receiver thread v
will be recognized / casted to CID
.
This would enable:
CID.asCID(v)
pattern without breaking any code or disrupting users.CID.asCID(v)
without having to coordinate all changes at once.I have not performed any benchmarks, but I'm pretty confident that performance of CID.asCID(v)
should be on par with CID.isCID(v)
.
This code was for illustration purposes so constructor was simplified from what is in the tree.
Me and @mikeal also have talked about the idea of representing CID
as data view over the underlying buffer similar to how Uint8Array
. If we were to go about it it would probably look something like code below, with all the properties just being lazy getters:
class CID {
/**
*
* @param {ArrayBuffer} buffer
* @param {number} byteOffset
* @param {number} byteLength
*/
constructor(buffer, byteOffset, byteLength) {
this.buffer = buffer
this.byteOffset = byteOffset
this.byteLength = byteLength
}
get version() {
const { version } = materialize(this)
return version
}
get codec() {
const { codec } = materialize(this)
return codec
}
get multihash() {
const { multihash } = materialize(this)
return multihash
}
}
const materialize = (cid) => {
const bytes = new Uint8Array(cid.buffer, cid.byteOffset, cid.byteLength)
const firstByte = bytes[0]
if (firstByte === 1) {
const bytes = new Uint8Array(cid.buffer, cid.byteOffset + 1, cid.byteLength - 1)
Object.defineProperties(cid, {
version: { value: 1 },
codec: { value: multicodec.getCodec(bytes) },
multihash: { value: multicodec.rmPrefix(bytes) }
})
} else {
Object.defineProperties(cid, {
version: { value: 0 },
codec: { value: 'dag-pb' },
multihash: { value: bytes }
})
}
return cid
}
I do think that might be a good idea, however it is orthogonal to the overall CID.asCID
proposal as they are not mutually exclusive nor they depend on each other.
Turns CID into non JSON.serializable
Given our conversation I've gathered this was nice to have.
Why is it nice to have to break JSON.stringify()
if your data contains CIDs?
Why is it nice to have to break JSON.stringify() if your data contains CIDs?
Please correct me @mikeal if I got it wrong, but I think it was something to the effect of preventing error due to naive serialization.
If I did get that wrong, and if want to get JSON.stringify(cid)
to work, we can add toJSON()
method that omits asCID
.
I see now that JSON.stringify
was also discussed at https://github.com/multiformats/js-multiformats/pull/18#issuecomment-645068878 and was decided in favor of not breaking JSON.stringify
, I've updated sections to reflect that decision.
Me @achingbrain and @hugomrdias had a discussion about this on call other day, and I will try to summarize it here.
CID.asCID
or CID.isCID
, but not both.
CID.isCID
for specific time frame after which CID.isCID
could be removed.CID.isCID
from adopting CID.asCID
although I also understand this may be impractical.CID.asCID
was solving was not very clear, so I'll attempt to break it down in the list:
CID.isCID(cid)
can return true
even if cid
is older implementation that lacks some method that CID
has, which could lead to errors with if (CID.isCID(cid)) { cid.nonExstingMethod() }
cid = CID.asCID(v)
is more robust because returned CID
will be instanceof CID
and there for have all the methods that caller expects.CID.asCID(v)
will work with multiple JS realms, because even though prototype chain is lost returned cid
will contain it.CID.asCID(v)
can't actually solve incompatibly problem, because binary representation can also change.
cid_v2 = CID.asCID(cid_v1)
is too costly (performance vise or maintenance vice) we can return null
.cid_v2 = CID.asCID(cid_v1)
work if that would make sense when we do it.cid_v1 = CID.asCID(cid_v2)
does the right thing which again could be to downgrade or return null
.CID
be a glorified ArrayBuffer
view (as in interface ArrayBufferView { byteOffset: number, byteLength: number, buffer: ArrayBuffer }
).
Uint8Array
Uint8Array
on CID
doesn't seems rightcid
instead of cid.buffer
ArrayBufferView
was a better approach.CID.asCID
from CID
being glorified ArrayBufferView
.
CID.asCID
can be backwards compatible change, while ArrayBufferView
can not (because we already have buffer property)Now that you’re considering a breaking change (deprecating isCID) I should note that there’s a future upgrade to multiformats
which is also a breaking change to CID, among other things. If you do go down that path you should probably
combine it with a migration to the new interface so that you don’t suffer 2 big refactors.
Now that you’re considering a breaking change (deprecating isCID) I should note that there’s a future upgrade to
multiformats
which is also a breaking change to CID, among other things. If you do go down that path you should probably combine it with a migration to the new interface so that you don’t suffer 2 big refactors.
@mikeal I have couple of questions:
multiformats
, but does not require switching everything to multiformats
?multiformats
(or some pieces of it) ready to be integrated into js-ipfs ?ArrayBufferView
approach described in the notes above fit into the multiformats
? @Gozala these answers are a bit complicated, I’m going to join the Core Implementations meeting on Monday and can discuss it in more detail there.
We had a higher bandwidth exchange today and have settled on the following course of action:
Phase 1: Implement
CID.asCID
method.CID.asCID
in place of CID.isCID
.Phase 2: Evaluate
CID.isCID
by deprecating it with warnings and removing it all together in the future.
This idea first was proposed in #109 and later came up in #110. I thought it would be best to factor it out into separate thread and discuss it here.
Problem
At the moment
CID.isCID(v)
is used all over the place as a glorifiedv instanceof CID
check. However it does not check ifv
is instanceofCID
instead it checks presence of special symbol. That addresses problem with having multiple different versions ofCID
implementations, however it does overlook the fact that two different version may in fact be incompatible e.g. new version may change behavior of specific method e.g.toString
I think that is already happening (per https://github.com/multiformats/js-cid/issues/109#issuecomment-641640712)Even if above is backwards compatible change, incompatible change may come sooner or later.
This is a symptom of more general issue with validation pattern. If you're inclined I would invite to read excellent parse don't validate primer that explains hazzards of that problem (in the context of typed language, but it applies to this just as well)
Proposal
Changing behavior of
CID.isCID(v)
to do component validation (if I interperent @hugomrdias suggestion https://github.com/moxystudio/js-class-is/issues/25#issuecomment-642803835) is going to be too disruptive to be worth it. Furthermore it would not address the problem stated above. Instead I would like to propose an alternative in form ofasCID
static method to which existing code base could be gradually migrated and future code base should default.Implementation could be something along these lines:
With such method in place existing pattern of
Could gradually be transitioned to following pattern instead:
It is a bit more verbose, but it would avoid issues that may arise from version incompatibilities. It will also preserve CID-ness when things are structured cloned.