holepunchto / hyperdrive

Hyperdrive is a secure, real time distributed file system
Apache License 2.0
1.87k stars 136 forks source link

Technical questions (current state of features as of now and in the future) #305

Closed dumblob closed 1 year ago

dumblob commented 3 years ago

I couldn't find a good comprehensive technological overview of the current state nor future goals, so I'm posting some important questions here.

Feel free to put them to FAQ or just compile the missing technical overview for other newcomers.

  1. What's the min & max latency (some numbers and/or benchmarks would be appreciated)?

  2. How is the latency determined/defined?

  3. Is the latency tunable (influencable by the programmer)?

  4. How is storage space managed (equilocally? i.e. each node is equally responsible for storing & serving data; or are there any differences to accommodate for at least constrained devices like smartphones & tablets)?

  5. What's the answer to the most widespread platforms which try to use the "push" mechanism to significantly decrease battery usage (i.e. smartphones, tablets)?

  6. What's the answer to the most widespread platforms which try to significantly reduce bandwidth usage at all cost (due to constrained data plans)?

  7. Is there any federated DNS-like system to make those long hashes memorable/nice-looking?

    It seems "standard" DNS will be used for the time being), but I haven't seen any plan nor developer's discussion about implementing anything.

  8. Does it work in pure web browser (e.g. Chrome, Safari, ...) without any intermediate ("broker/gateway") server/node and without any special browser? I mean imagine a web extension providing such functionality or a predownloaded (so called Progressive Web App).

  9. What are the storage limits (max filesize, max storage number of files, max cumulative amount of stored data overall, ...)?

  10. How do the guarantees look like? E.g. mkdir is atomic, directory rename is atomic, probability of loss of data after sync() returned under which conditions (e.g. if 95% of all nodes will become indefinitely unavailable - e.g. due to some denial of service attack or catastrophe or war or bad BGP routing or whatever...)?

jerrygreen commented 3 years ago

I'm also curious about #8 question: "Does it work in pure web browser (e.g. Chrome, Safari, ...)"

I'd like to use P2P storage in pure client application, without any need of a central server. Yet, this application should have some data which is being synced, and accumulated in a single shared storage (shared via P2P ofc, since no central server allowed).

I'd like to use PWA and File System Access [Text editor Demo], - pretty much like this "Text editor" example, but using folder access, rather than singular file: so I can read from and write to various files while asking for access only once.

But I can't say if Hyperdrive can work within such conditions. I bet it doesn't, because in order to share files, it looks like this application that is using Hyperdrive, SHOULD start some kind of a lightweight server, a daemon (i.e. Hyperspace). And it's pretty much out of capabilities of PWA, as far as I know, because it cannot expose anything to a port (i.e. web apps cannot run servers).

Beaker, unfortunately, doesn't support PWA... And it seems we need not just PWA here, but some extension of PWA, which would also allow us expose something to a port, like in NodeJS app, - in order to share it with others in a P2P-way, without a central server.

aral commented 3 years ago

Regarding #8, if you want apps in the browser to be a thing, the first hurdle is to get Apple to reverse its decision to delete all data after seven days. Until then, sadly, it’s like building apps on a platform that runs a weekly ‘rm -rf /‘ cron job.

(I wanted to do this too and even started prototyping stuff but no way I can create a usable experience that relies on replicating to the browser while this is the case.)

jerrygreen commented 3 years ago

@aral I currently have some little personal project I'm developing, which is PWA app too, and I use it on iOS too. It seems I use it more frequently than once per 7 days, because it's always available offline. But I've read the same fact elsewhere and I agree: it's an unfortunate limitation, and for a true production app, it would be cool if they won't delete these applications even if they're unused for long periods (i.e. they have somewhat the same rights as native apps). Btw that's not exactly the same as ‘rm -rf /‘ cron job, - because it can be reset, while cron would run despite of any events.

They're a bit slow with PWA though: caches API is unavailable, navigator.storage is unavailable, and many other fancy features are unavailable too. But long ago, Steve Jobs said once, in iPhone presentation, that for iOS there won't be need to write native applications, and that all applications will be web apps. I believe (and hope) that they will turn back to this idea again.

Btw, not that long ago, they released new web version of Apple Music:

I personally don't care if they catch up or not, though I believe they will, but I'm using PWA features for my personal projects anyway.

And it would be cool if Beaker would allow installing PWAs too, so if you install an app from Beaker, then it might use P2P storage (i.e. Hyperdrive). For this, please put thumbs up reaction to my issue about this, to bring some interest: https://github.com/beakerbrowser/beaker/issues/1885

RangerMauve commented 3 years ago

You can use Hyperdrive in the browser using hyper-sdk https://www.npmjs.com/package/hyper-sdk

jerrygreen commented 3 years ago

@RangerMauve so, two questions:

  1. Will it make possible to GET info from Hyperdrive in say... Chrome? Without server?
  2. Is it possible to PUT some data back into Hyperdrive? It requires a running daemon (so others might request it), isn't it? And it's only possible to run a daemon in NodeJS environment or browser-like environment like Electron apps (which also includes NodeJS environment), but not in actual browser like Chrome. Isn't it?
RangerMauve commented 3 years ago

It's impossible to do it without some sort of server if you want to share the data with somebody because browsers don't give us the TCP and UDP APIs we need for p2p connections.

The web version of hyperdrive lets you create data offline by running a full hyperdrive instance in the browser (no need for any daemons), then if you want to sync with a peer it uses a combination of WebRTC (which requires a signaling server), and a proxy server which talks to node.js peers so that you can reach the rest of the network.

With this you can use hypercore and hyperdrive as you normally would and can replicate with peers if you have some sort of internet connection without having a central server keeping your data or needing to install anything along side the browser.

jerrygreen commented 3 years ago

@RangerMauve

impossible to do it without some sort of server

it uses... proxy server

without having a central server

Ok, without central server, - that sounds good. But proxy server... It sounds pretty centric, which makes it "almost P2P, but not completely P2P". But if it's a price to be able to run this thing in Chrome, then... I may give it a try: I'll definitely look into this!

I'm confused a bit though, because it seems it's related to dat project, - I thought it's deprecated in order to launch Hypercore. But this SDK is renamed from dat-sdk to hyper-sdk, so I suppose it's still in use, even though it's part of datproject org, and not hypercore-protocol org. Weird.

RangerMauve commented 3 years ago

The thing about the proxy server is that you can self host a proxy server and any proxy server will be able to connect to the same p2p peers. You could even have users configure their own in your app.

Dat project was where the hypercore stuff was developed for initially. Dat-sdk existed before the hypercore-protocol split off into it's own thing and created new branding.

There's still a community of folks under the dat project and there's a bunch of things that were building on dat-sdk even though it was using the latest hypercore-protocol code.

hyper-sdk is something I'm maintaining and isn't managed by the hypercore-protocol team.

RangerMauve commented 3 years ago

BTW we're having a community call if you wanna come in and ask question. 😁

https://github.com/datproject/comm-comm/issues/184

dumblob commented 1 year ago

@mafintosh any pointers how and where all the 10 points were tackled?

Thanks a lot!

mafintosh commented 1 year ago

@dumblob hi! checkout the latest version, and the general hyper docs at docs.holepunch.to. That answers most of this. Feel free to ask any question about the modern stack in a new issue.

dumblob commented 1 year ago

Well, the documentation still does not seem to answer my 10 questions (except for the (8) which got sort-of answered above)?

Could someone knowledgeable perhaps spend 5 minutes answering all the questions off the top of their head?

I do not need references (I can find them if I will need more info), so you really do not need to spend hours with finding references for your answers.

Thanks a lot!

LuKks commented 1 year ago

@dumblob

1-3: Latency is mostly defined by the network and distance of the connected peers. There is some overhead around writing new values, replication being async on background, DHT, etc but overall it's defined by the network as usual. Here it's all about Hypercore in terms of improving download speeds, and we're making progress towards improving replication speeds, still lots to unlock! See benchmarks here https://github.com/holepunchto/hypercore/tree/main/test/bench and especially bench/networking.js

4: You want the writer to have all of the data, the rest of the peers are sparse by default. Can query and download only what they need but those small peers still helps to replicate even when sparsed. If low on storage, peers can always clear the local data, as they can re-download it from other peers again. Technically, you can lose the writer but if you still have the key pair then you can restore the writer by downloading the data again from another peer, but if done wrong you risk corruption so don't play with the writer.

5: We solved replicating Hypercores on mobile by creating push peer nodes that forwards encrypted replication messages to Apple/Google so devices receives them, it's already working on Keet. Soon we will add a more descriptive explanation in the keet.io section of the main docs.

6: Hypercore, by extension Hyperbee and Hyperdrive too, are sparse by default and internal protocols are extremely optimized so bandwidth usage is already low (depends on usage of course) and potentially can be a bit more lower in the future.

7: Not yet, when you add unique IDs to a P2P system things get complicated but I don't know about this. Personally I used Hyperbee for a DNS server once and it works great but it's centralized, although someone smarter could add Autobase to that. I think for making it kind of truly decentralized you need your own TLD and your own authoritative DNS servers that multiple people owns, so the solution looks like in a multi-sig scenario, again I don't know about this.

8: It's possible but we're not focused on that path. All the libs works on the browser except for HyperDHT because we use UDX (UDP) which browsers doesn't support but we like Electron for true P2P apps. There is dht-relay for the browser which is not the answer you want because this is a just a server that relays data. WebRTC lacks discovery mechanisms, so I think there is a good enough scenario where you use dht-relay for discovery only (so yea a server but it's kind like having DHT bootstrap nodes), then you make the actual P2P connection using WebRTC, so at least the heavy data is P2P so it's practically very scalable. Personally, nothing wrong with dht-relay for small web apps with low traffic because you still use all the Hyper tech but do notice that it's centralized and it compromises.

9: No immediate limits other than the file system I think, this was kind of answered here: https://github.com/holepunchto/hyperdrive-next/issues/2

10: Atomic vs unavailability are different things. Everything that should be atomic it's atomic, see Hyperbee and Hyperblobs, in combination with the put method in the current version of Hyperdrive, that method is not truly atomic as there are two cores involved but not being atomic in this particular case doesn't affect anything because you add a new blob then let's say you don't update the drive's database with the new blob id then nothing bad happens (for example, vice versa would be bad). If you have a drive that no one else is replicating then that's another issue, although we have tools to make it easier for you to have backups, see https://github.com/holepunchto/simple-seeder

dumblob commented 9 months ago

Hi @LuKks, this is an awesome answer! It is exactly what I was looking for. I truly appreciate you devoted your precious time to clarify these in-depth topics.

I wish I could support your work somehow. What would be your preferred approach of receiving support?

Thanks once more!