textileio / go-threads

Server-less p2p database built on libp2p
MIT License
451 stars 65 forks source link

General Question on ThreadDB functionality #588

Closed lonnietc closed 7 months ago

lonnietc commented 8 months ago

Hello,

I am experimenting with P2P databases in Golang of which there really are not many but came across ThreadsDB which looked very interesting.

Do I understand correctly that if I start up say 10 multiple instances, as an example, of the threads daemon (i.e. on Windows, and Linux) on a few different machines then my data will be sharded and replicated across the nodes?

From this I would mean that sharding the data across all 10 machines (i.e. splitting the data) and also that I can ask threads to replicate data across some of the machines as well (i.e. have multiple copies of the sharded data) in case of node failure?

Also, is it possible to have some data stored, or pinned, to a specific machine and other data sharded across the other nodes?

Just trying to get a clearer picture of how ThreadsDB can work as I have some ideas on how it might be very useful for a project that I need to build and scale to an extremely large number of nodes.

Thanks and have a great day

sanderpick commented 8 months ago

Hey lonnietc, thanks for reaching out.

From this I would mean that sharding the data across all 10 machines (i.e. splitting the data) and also that I can ask threads to replicate data across some of the machines as well (i.e. have multiple copies of the sharded data) in case of node failure?

Yep, correct!

Also, is it possible to have some data stored, or pinned, to a specific machine and other data sharded across the other nodes?

A thread doesn't need to be hosted by all the nodes, correct.

Disclaimer: While we are happy to answer questions about threads, as you can see from the commit history, we aren't actively working on it anymore.

lonnietc commented 8 months ago

Thanks for your message back and also for taking time to answer some of my questions.

One of the projects that I have is in needing a very highly scalable (sharding and replicating) SQL database and was thinking that just perhaps that I might be able to integrate ThreadsDB with another project that I came across called "go-mysql-server" (https://github.com/dolthub/go-mysql-server/tree/main) which is a Golang compatible MySQL server (mostly).

The idea was that just maybe, but not sure yet, it could be possible to be able to integrate these two to produce a P2P system such that there is a single binary that can be started locally and connect to other nodes to shard and replicate data. Basically to input mysql data and queries but have threadsdb as the underlying P2P storage across the network.

Additionally, I will figure out a way so that the data can be pinned to the locally running node or if it can be available for sharding. This would make something akin to IPFS in that you can pin data locally or store it on the network.

Ultimately this could make a type of P2P SQL server for user commodity systems that is reasonably compatible with MySQL clients but in a P2P fashion and not really in a distributed cluster that is more commonly done now. In this approach, the data is sharded and replicated across user nodes that may be also accessing the shared data.

That is the idea at least although maybe too challenging to get completed.

I will need to read more documentation on threadsdb to see how to shard and replicate data effectively as well.

Thanks again

sanderpick commented 7 months ago

Sorry for the delay here! What you're describing has some overlap with the next version of our Tableland.xyz project. If you feel like following along, we're over in discord here: https://tableland.xyz/discord.