rchain / rchip-proposals

Where RChain improvement proposals can be submitted
Apache License 2.0
8 stars 5 forks source link

provide for data retention and storage payment #40

Open jimscarver opened 3 years ago

jimscarver commented 3 years ago

Introduction/Motivation/Abstract

It has been stated that rchain will eventually delete data that has not paid for continued storage. We cannot allow tuple space to grow without bound with reads and writes that are never likely to occur. Currently there is no distinction between items in tuple space. No record is kept of when they were created or last accessed.

There is no immediate need to solve all the retention issues at this time but there is a need to insure we are creating a tuple space that will allow for a retention policy in the future.

Design

A minimum implementation is a modification of tuple space to keep a least recently used list of tuples along with the block number.Additional future fields can be provided for.

at some block height least recently used tuples could be dropped . Then after each propose data can be purged for the lower block height plus one. The number of block age always kept may be constant or inflationary TBD.

In the future keeping the deployid for tuples could enable refreshing deploys.

A comprehensive solution might include a rholang extension allowing names to listen for the deletion event

Alternatives

?

dckc commented 3 years ago

I just mentioned space rental to @leithaus earlier this week; he said there's an architecture / design from 2018 that just hasn't been built yet.

Meanwhile...

We cannot allow tuple space to grow without bound with reads and writes that are never likely to occur.

The last finalized state has exactly the channels that could ever be accessed again. So a straightforward way to garbage collect is to stop a node and resume from LFS.

This doesn't address the time-cost of storage. But storage does get cheaper over time... at a conference of librarians I saw a presentation on "endowed publication" where they charged, say, $1000 to store 1 GB "forever" using interest on an endowment.

jimscarver commented 3 years ago

garbage collection is not the issue here though it is certainly important.

The issue is reachable data that is ancient. URI's and deployerIds may be tips of the iceberg for names in contracts and other data structures that are forever reachable..

he said there's an architecture / design from 2018 that just hasn't been built yet.

I cannot think of a possible solution that does not require a hard fork which is why I think we need to address this before the planned hard forks.

dckc commented 3 years ago

Before the planned hard forks seems quite ambitious. This seems like a Venus thing.

jimscarver commented 3 years ago

The last finalized state

We lose all the history in the LFS such that we cannot distinguish the old from the new.

Keeping the block number of last access allows mimicking natural memory with old memories, not reinforced, being lost.

This seems like a Venus thing.

Greg has a different emerging time solution in mind. Keeping block number seems a easy fix for now with some other notion of time added later but it seems it will be delayed Greg does not seems to be considering last access. I am not convinced the charging for storage scheme being considered is viable.

jimscarver commented 3 years ago

Greg wants a natural notion of relative time ordering to emerge. I agree.

I suggest that we use last accessed block number (or block time) to represent that ordering temporarily. I expect it to be a long time before we need to improve it and it can enable the entropy required by natural systems. The common clock referenced is the blockchain itself.

Validators need all to agree on which tuples are expired based on the maximum range of block number of last access are valid. Encountering tuples having a earlier last access are removed not executed to be in sync with other validators.

A periodic sweep can remove expired tuples, shrinking the size of tuple space and backing them up if desired. There is the possibility of resurrecting expired tuples from backups.

A particular block number on a shard is a type of time ordering on blockchains and we can prepare to add other time ordering types later. Behavioural types ultimately can enable greg's dream. On blockchains block height is a natural ordering available immediacy without the overhead of it's emergence.

@tgrospic suggested we experiment and see the overhead cost of keeping the last accessed in the tuple space

jimscarver commented 3 years ago

In my career I have often documented a problem that was inevitable but often the warning wasn't heeded until the crises occurred. Most often I had already developed a plan for recovery however if the size of tuple space becomes unmanageable and we do not have last access for tuples I see no path of resolution.

I think it is worth at least determining the overhead of keeping the last access for tuples in rspace.

The argument that we can always add last accessed later, assuming that is true, is okay as long as we anticipate needing it well in advance of a crises.