Open azf20 opened 2 years ago
Considering the future requirement of time-travel queries of availability blocks, I now recall that we can't do most-recent-wins at indexing time, it requires quadratic space to represent on the DB as has been demonstrated in past discussions.
To do most-recent-wins at query time, we need to allow conflicting entity versions to coexist in the DB. But I fear that it would not be possible to write efficient SQL to handle conflict resolution for collection queries.
So I'm becoming skeptical of generalized 'most-recent wins' conflict resolution. That brings us back to a solution I previously proposed that looks like (strawman syntax):
metadata: Metadata @derivedFrom(field: "project") @mostRecent
Where there can be multiple Metadata
referring to the project, each with their own ID, but the field declares that it wants the most recent one.
OK. I was thinking about this, and while it may be a bit confusing from a user perspective, there is a robust way to generate the distinct IDs, based on the CID, i.e.
export function handleProjectMetadata(content: Bytes): void {
const cid = String.UTF8.decode(dataSource.address().buffer)
const data = json.fromBytes(content)
const _projects = metaPtrData.toArray();
for (let i = 0; i < _projects.length; i++) {
// construct projectId
const _project = _projects[i].toObject();
const _id = _project.get("id")
if (!_id) continue;
const projectId = _id.toString().toLowerCase();
const metadata = new Metadata(cid + "-" + projectId)
metadata.project = projectId
metadata.save()
}
}
To discuss syntax:
metadata: Metadata @derivedFrom(field: "project") @mostRecent
metadata: Metadata @derivedFrom(field: "project") @lastUpdate
metadata: Metadata @derivedFrom(field: "project", selectBy: "mostRecent")
Other questions: will this decorator only be available for file data source entities?
I would be keen to unpack the trade offs here though, as we don't currently have the availability chain - how simple will this change be for the query layer, with all its permutations (interfaces, derived fields, unions etc)? I think (?) the introduction of an availability chain would mean removal of either workaround (indexing time or query time), so I have a preference for whichever is simpler (for users, and to implement then update)
This directive would essentially apply a sort order and take the first. In principle it could apply to any derived single-entity field, but I haven't analyzed the implications if the field type is an interface.
What seems complicated to me about the sql queries for collection fields is the interaction with first
and skip
. But this is beyond my SQL-fu, we'd need @lutter's opinion to determine what is feasible.
A directive that is only supported by derived single-entity fields punts on the question of collection fields, which is convenient since those are not relevant to the use case at hand. The directive being at the field granularity avoids incurring any performance costs to unrelated queries.
This query-time solution would not change with the introduction of the availability chain, it would only change as much as any other query. An indexing-time solution would probably need to assume a total order between the availability chain and the data chain, but we're trying to avoid answering that question at this point.
A directive that is only supported by derived single-entity fields punts on the question of collection fields
I think this is the right approach. Given that approach, I don't think we need to worry about first
and skip
?
Exactly, that is one of the goals. For derived single-entity fields, it seems "obviously possible" to implement because we can implement it as a collection query with first: 1
and the chosen sort order.
On the sort order, I'm thinking it would be order by lower(block_range) desc, causality_region desc, id asc
. The id
is there to guarantee uniqueness since there can be entities created in the same file handler or in the same on-chain block. And we don't currently have a way to know which entity versions were created first within a same block.
And we don't currently have a way to know which entity versions were created first within a same block.
Is this still the case with order by lower(block_range) desc, causality_region desc, id asc
?
Yes, that would be ordering entities within a same block and causality region by id, which might not match the insertion order.
Hello together, I am not sure if this is the right issue. I am trying to use the file data sources to store token metadata, but I am unable to load an entity from the store. I would like to do the following in a datasource handler after a token is minted and the corresponding entity was created:
export function handleMetadata(content: Bytes): void {
const value = content.toString();
let context = dataSource.context();
let address = context.getString("address");
let token = Token.load(address);
if (token) {
token.metadata = value;
token.save();
}
}
Unfortunately this doesn't work. The address
variable is set correctly, the value
variable correctly contains the ipfs content, but the loaded token
is null
.
I really need this feature for my master thesis, otherwise I have to look for a workaround. So is this the right issue to look for the current status of implementation?
Currently, different File Data Sources can create multiple entities with the same ID.
File Data Sources should instead be able to update an entity with the same ID. Note that this will not be an upsert, which is the current pattern that is used for chain-based data sources. Instead this should completely overwrite the prior entity.
Situation:
Entity updates should apply a "most recent wins" approach, where the time is determined by block time, not handler execution time. This is a new pattern (where entities might be created with closed block ranges), and order will need to be resolved within blocks (as well as between blocks)