mafintosh / hyperdb

Distributed scalable database
MIT License
752 stars 75 forks source link

add timestamp to db nodes #97

Closed mafintosh closed 6 years ago

mafintosh commented 6 years ago

useful for resolving conflicts etc

mafintosh commented 6 years ago

https://github.com/bnewbold/dat-deps/blob/dep-hyperdb-timestamps/proposals/0000-hyperdb-timestamps.md

hackergrrl commented 6 years ago

This feels like a heavy thing to add at the hyperdb layer. Not all applications will benefit, and it's a non-zero cost to incur per node. I like the idea of hyperdb not pushing opinions like whether it thinks that higher layers might want timestamps or not -- maybe this makes more sense at the hyperdrive/etc layer?

mafintosh commented 6 years ago

@noffle would be opt-in. i'm all for keeping this as minimal as possible (as always). the main thing i wanna solve is being able to get better metadata for resolving conflicts that involve deletes, as currently you just have a null value.

andrewosh commented 6 years ago

Since timestamps could be added at the user level for puts (and as @bnewbold mentioned in the DEP they can't be used for security/trust in the protocol), an alternative would be to support a custom tombstone object in delete.

With that approach, you could add a boolean deleted field to Entry instead of using a null value to signify deletion. If deleted, optionally return tombstone. Get some space savings (bool vs. timestamp), but that wouldn't be an opt-in cost.

mafintosh commented 6 years ago

digging @andrewosh's idea. i'll impl this at some point unless someone wants to do a PR before me :)

brettneese commented 6 years ago

I'm curious @andrewosh @mafintosh how would you go about implementing userland timestamps? For instance, if I want to browse the history of a file in a dat archive by time instead of by version?

This proposal would be very helpful to me personally but it may be possible to write some kind of layer on top of hyperdb that handles that kind of thing without touching the protocol.

Currently I'm just indexing timestamp --> archive.version.toString() in a hyperdb after a dat write but that sounds less than ideal (and I'm not sure how to put the index itself on dat so other clients can browse it).

xloem commented 6 years ago

I wonder if it might be nice to use a separate stream of metadata, for those use-cases when it is changing more frequently than the data (e.g. if you wanted access times recorded).

With regard to timestamps, I have a use that could benefit from a more custom format:: I'd like to pair my streams with a blockchain, and include blockchain hashes to prove that the data was created after the given block time. I'd also like to post signatures to a blockchain and reference them, to prove that the data was created prior to the given block time.

bnewbold commented 6 years ago

@xloem: the metadata and content feel pretty separate to me already, do you mean some additional abstraction layer? It sounds like your blockchain timestamps/signatures would have different semantics from what is proposed here; maybe you could open a new thread of discussion?

xloem commented 6 years ago

Sorry, I had seen the linked DEP, and I think I missed a few things when I made that comment; I wanted the use-cases to be included in design work.

I see now that I can wrap my values in a data structure that has metadata, or make metadata keys that pair with value keys. It would make implementations easier if there were an optional field for both puts and deletes that could hold arbitrary data, but I think the solution of the user creating paired keys is much more robust and would even make tombstone unnecessary

@brettneese I think the way to go is to create paired keys, such that /my/file.txt has a matching /my/file.txt;metadata or /;metadata/my/file.txt

mafintosh commented 6 years ago

We support userland timestamps now cause 3.0.0 uses "semantic deletes" instead of implicit ones which means they are just puts with a "deleted" flag set to true. Therefore you can attach any value to that, that will be available in a conflict resolution scenario