orbitdb / field-manual

The Offical User's Guide to OrbitDB
210 stars 44 forks source link

Data persistence on IPFS #86

Open cbruguera opened 6 years ago

cbruguera commented 6 years ago

Hello, I'm just curious to know how is data persistence handled on IPFS, given that (if my understanding is correct) files must be "pinned" by an ipfs node in order to achieve persistance, yet the data is prone to loss (or unavailability) if it's hosted on a single node that goes offline.

Where can I check for details on how is persistence (and availability) ensured by OrbitDB? Also, on a related note, does OrbitDB work as a "private" cluster of ipfs nodes or is this connecting to the IPFS network as a whole via some public gateway?

Thanks beforehand for any feedback on the matter.

fazo96 commented 6 years ago

Whether orbitdb uses a private cluster of IPFS nodes depends on how you configure IPFS/libp2p, by default they connect to the public network but I think there's some early experimental support for private networks.

Persistence in IPFS works like this: you only replicate the data you read, so you will never end up caching, uploading or replicating something that you didn't explicitly request to IPFS.

In OrbitDB, when your local copy updates it fetches all the new entries from IPFS so they get copied to your local node, and your local node will help serving them to the rest of the network.

Pinning means that when the garbage collector of IPFS runs (to free some space) it won't ever delete your pinned stuff. OrbitDB as far as I know never pins anything

balupton commented 6 years ago

Is there an option then to do a full clone of the data, and to keep it up to date with new data, to ensure persistence across multiple nodes that are programmed to do the same?

fazo96 commented 6 years ago

By default, when you open a database with orbit-db it syncs up so that all of the nodes replicate all the data, and they also cache it locally, so if you restart them you can load what they synced from the local storage instead of having the replicate it all over again from the network.

For this to work there has to be at least one reachable online node to sync from, otherwise new nodes won't be able to get the data. If nobody is online or you delete the local storage of all nodes, then you will have lost the database.

So @balupton just by opening the same database on multiple machines/nodes they will keep up to date by themselves.

Of course if you write multihashes (for example you want to keep a feed of videos, so you write the multihashes of the videos to a orbit-db-feed like this { videoMultihash: 'Qm...' }) the multihashes will not be opened by orbit-db (they are just strings) and you will have to ensure those are replicated yourself, orbit-db will only keep the objects you put into it synced and won't follow links or multihashes

haadcode commented 6 years ago

@fazo96 has already done great job at explaining the persistency, but I wanted to add a note that currently js-ipfs doesn't have GC, so nothing gets removed meaning everything is pinned by default.

However, this will change in the future as js-ipfs gets GC and we want to make sure that OrbitDB is actually persisting everything (by default), so some work on pinning needs to happen. If you're using OrbitDB with go-ipfs (through js-ipfs-api), then GC happens and data may not be persisted anymore after a time. This is a known issue and we're planning to implement actual pinning (from IPFS perspective) soon.

balupton commented 6 years ago

So @balupton just by opening the same database on multiple machines/nodes they will keep up to date by themselves.

Sweet. And what about the option of having it so new nodes can add new data without replicating past data?

aphelionz commented 5 years ago

Moving to the Field Manual for more details / discussion

revolunet commented 4 years ago

Hi, does anyone have an exemple of setting up a "backup" ipfs server for orbitdb persistance ?

aphelionz commented 4 years ago

There are a few different efforts going. The one I've been using and working on is https://github.com/Jon-Biz/orbitdb-pinner

Belz-tech commented 3 years ago

Hi. I am really new to IPFS and OrbitDB. I have read quite a lot of articles and have done extensive research, but am still confused. Could you use it for inventory management, and if so, how do you keep your data of all transactions, stock movement, available stock etc? Surely everything can't sit in nodes and be dependant on someone reading / pinning it. What happens if all nodes goes down?

bitcard commented 3 years ago

mark