hypercore-protocol / hyperdrive-daemon

Hyperdrive, batteries included.
MIT License
156 stars 24 forks source link

Feat: Sync with folder #41

Open pfrazee opened 4 years ago

pfrazee commented 4 years ago

We've had a couple requests for the ability to sync a drive with a folder using a mechanism other than FUSE (similar to the dat cli previously). This has a couple potential upsides:

Figured we should start with an issue to discuss the possibility.

dpaez commented 4 years ago

A feature like this is quite interesting for us in the p2pcommons SDK

m4gpi commented 4 years ago

It may well be that CoBox ends up using something like this as a monkey-patch until fuse-native supports windows (also looking at rolling back to fuse-bindings for use w dokany), we are planning Mac and Windows support and installers in the near future. If this were a bridge that could operate between hyperdrive and the file system, we would likely be able to use this for kappa-drive too.

okdistribute commented 4 years ago

dat cli might be a good place to implement this. It is pretty tied to the dat-node library, but there are good tests down the stack. Not sure at this point if it would be best implemented within dat-node, or to replace dat-node.

andrewosh commented 4 years ago

Hey all, we added upload and download commands to the CLI in 1.12.3 (the latest beta version). Both commands watch their source directories for changes, and should mirror much of the clone/sync functionality of the old CLI, with the same UX, i.e.:

❯ hyperdrive download a1be575c2f50027305cea930b49ff0275bd7ea67623f8b254aed9c4fb121d5c8

and to upload the current directory:

❯ hyperdrive upload
// or if you know the target drive key
❯ hyperdrive upload <my-key>

Lemme know what you think or if you have any issues with it!

okdistribute commented 4 years ago

Hey @andrewosh cool work!

I tried this out today on my mac:

karissa ~
$ hyperdrive upload ~/Documents/andino-ecuador/
Uploading /Users/karissa/Documents/andino-ecuador into f97309903157b3699f804fd118f5243b5899dc662c81118f2b84a98ae7952ab2 (ctrl+c to exit)...

Uploading | ======================================== | 100% | 32/32 Files^C
Exit signal received. Stopping upload...

Seemed to work alright, but I couldn't find my files afterward. I first tried verbatim, simply hyperdrive <key>, but no folder was created (edit: seems like the Network directory was simply copied into my local directory). I then tried with a second argument which was the directory, but it gave an error -- seems to not auto-create the directory for you. So then,

$ mkdir andino 
$ hyperdrive download 6ea63522c9bcbb56e2a767d3a21f338020a13a64a84e506512fbefcfea6bcc43 andino/
Downloading 6ea63522c9bcbb56e2a767d3a21f338020a13a64a84e506512fbefcfea6bcc43 into /Users/karissa/andino (ctrl+c to exit)...

Downloaded | ======================================== | 100% | 48/48 Metadata Blocks | 0 Peers^CExit signal received. Stopping download...

Aha finally worked, I thought! But sadly no, my files did not exist.

karissa ~/andindo
$ ls
Network

karissa ~/andindo
$ ls Network/
Stats

karissa ~/andindo
$ ls Network/Stats/

The stats seem to be there though, when I look at Network/Stats/key/storage.json

{
  "/07 Yura paloma.m4a": {
    "blocks": 103,
    "size": 6729243,
    "downloadedBlocks": 103
  },
  "/05 Caina Manda Cunangaman.m4a": {
    "blocks": 82,
    "size": 5322416,
    "downloadedBlocks": 82
  },
  "/.hyperdrive-key": {
    "blocks": 1,
    "size": 32,
    "downloadedBlocks": 1
  },
etc...
andrewosh commented 4 years ago

Hey @okdistribute that's a strange one! On the upload side of things, it will only create the drive for you (with the key that the command outputs), but it won't create any directories within FUSE. Is that what you mean by auto-creating the directory for you?

The download part of your example's interesting -- not a clue how it would be copying the Network directory, since that's entirely virtual within the FUSE mount (so it looks like mirror-folder would have to be copying directly from your FUSE-mounted filesystem, and not from a remote hyperdrive). Can you stat ~/andindo and make sure it's not somehow symlinked to ~/Hyperdrive? Another possibility is that upload decided to attempt to upload ~/Hyperdrive -- did you ever run it without arguments in ~/Hyperdrive?

Also I'm a bit confused by the keys in your example. It looks like the first directory was uploaded into f97309903157b3699f804fd118f5243b5899dc662c81118f2b84a98ae7952ab2, but in your subsequent download you're using 6ea63522c9bcbb56e2a767d3a21f338020a13a64a84e506512fbefcfea6bcc43. Did you get that second key with the hyperdrive info command perhaps?

Sorry it didn't work out of the box for ya. If you have any extra info re: the stuff above + the output of hyperdrive status I'll trace this down.

okdistribute commented 4 years ago

6ea63522c9bcbb56e2a767d3a21f338020a13a64a84e506512fbefcfea6bcc43 is another key in the drive that should also have content. Today I was able to get it to work on a second try with f97..! Still a bit confused though about why the 6ea63522c9bcbb56e2a767d3a21f338020a13a64a84e506512fbefcfea6bcc43 key didn't work, since that is actually just another key in my drive.

It isn't clear to me what is supposed to be visible in the ~/Hyperdrive folder or how I'm supposed to be confident that the files have been successfully uploaded there. What is this the intended behavior after using upload? Are you testing on mac or linux (should I go test with my linux machine?)

Screen Shot 2020-04-28 at 9 58 19 AM

I used the command

cd ~/Documents
hyperdrive upload andino-ecuador/
$ stat ~/andindo
16777218 3410208 drwxr-xr-x 4 karissa staff 0 136 "Apr 27 23:18:10 2020" "Apr 27 23:15:21 2020" "Apr 27 23:15:21 2020" "Apr 27 23:15:16 2020" 4096 0 0 /Users/karissa/andindo

karissa ~/Documents
$ ls ~/andindo/
Network
andrewosh commented 4 years ago

Ah gotcha, there are a few separate issues here I think. Trying to split them apart:

First issue: hyperdrive upload <some-folder> doesn't interact with FUSE at all (it won't have any effect on the stuff in ~/Hyperdrive). It just creates a new drive and spits out the key, which can then be mounted manually into your FUSE drive as a separate step.

Since the main use-case for upload/download commands is to handle importing/exporting (which might be better names come to think of it) for Windows users, or other users who don't want to deal with FUSE, I've kept them completely separate. So in your case, where you're using both FUSE and these commands, the flow would be:

Second issue: On the download side of things, I'm trying to understand why the Network folder was downloaded. Seems to me that the most likely scenario is that either:

  1. You called the upload command on the ~/Hyperdrive directory, since that would mirror everything in ~/Hyperdrive (including the virtual directories like Network) into a new drive, which you then subsequently downloaded.
  2. upload has a bug where it's defaulting to uploading your root drive (the one at ~/Hyperdrive), but I can't seem to repro that yet.

Third issue: If 6ea... is a valid drive in the daemon, it should be downloadable, so that looks like a clear bug if it can't. Do you own this drive, or is it a read-only drive that isn't fully-synced locally? Quick way to tell would be to cat ~/Hyperdrive/Network/Stats/6ea63522c9bcbb56e2a767d3a21f338020a13a64a84e506512fbefcfea6bcc43/storage.jsonand see if all the blocks are locally available.

Do you know what the contents of 6ae... are supposed to be? As in, is download downloading the incorrect stuff? I'd been assuming that it was generated by upload.

lachenmayer commented 4 years ago

Heya, upload & download are great, super useful to be able to just download hyperdrives without having the FUSE stuff running, like the Dat CLI. Just wanted to add my experience with it so far, in case it's helpful :)

My setup is as follows: I have a folder with a single file (test).

λ hyperdrive upload
Uploading /Users/harry/Downloads/test into 3d70fea669f86ddee99c8caf623d8ab9137086a0c896d92a18753b56b6dbf575 (ctrl+c to exit)...

Uploading | ======================================== | 100% | 4/4 Files

Minor thing, but would have expected the file count to be 1/1 or 2/2 (for the .hyperdrive-key file that gets created in the folder). The files seem to be added as expected:

λ cat ~/Hyperdrive/Network/Stats/3d70fea669f86ddee99c8caf623d8ab9137086a0c896d92a18753b56b6dbf575/storage.json 
{
  "/.hyperdrive-key": {
    "blocks": 1,
    "size": 32,
    "downloadedBlocks": 1
  },
  "/test": {
    "blocks": 1,
    "size": 12,
    "downloadedBlocks": 1
  }
}

Running hyperdrive download 3d70fea669f86ddee99c8caf623d8ab9137086a0c896d92a18753b56b6dbf575 on the receiving end created a .hyperdrive-key file in my current directory, which was my home directory when I tried it. Not the ideal solution IMO, would prefer if it defaulted to creating a directory named after the key, so that I don't overwrite my home directory by accident! No files though unfortunately.

I then tried hyperdrive download 3d70fea669f86ddee99c8caf623d8ab9137086a0c896d92a18753b56b6dbf575 hyperdrive-test/ - this results in a file not found error for hyperdrive-test/.hyperdrive-key, because the directory doesn't exist. Would be cool to create that directory if it doesn't exist, as noted above by @okdistribute!

Anyway, after running this, I just get the following:

λ hyperdrive download 3d70fea669f86ddee99c8caf623d8ab9137086a0c896d92a18753b56b6dbf575 hyperdrive-test/
Downloading 3d70fea669f86ddee99c8caf623d8ab9137086a0c896d92a18753b56b6dbf575 into /home/harry/hyperdrive-test (ctrl+c to exit)...

Downloaded | ======================================== | 100% | 0/0 Metadata Blocks | 0 Peers

This is on my local network btw.

Stats output:

λ cat ~/Hyperdrive/Network/Stats/3d70fea669f86ddee99c8caf623d8ab9137086a0c896d92a18753b56b6dbf575/networking.json
[
  {
    "path": "/",
    "metadata": {
      "key": "3d70fea669f86ddee99c8caf623d8ab9137086a0c896d92a18753b56b6dbf575",
      "discoveryKey": "68bfa6767231e57e3565ea568b416dff00d5a393e96440178d01983b318b4d67",
      "peerCount": 0,
      "peers": [],
      "uploadedBytes": 0,
      "uploadedBlocks": 0,
      "downloadedBytes": 0,
      "downloadedBlocks": 0,
      "totalBlocks": 0
    },
    "content": {}
  }
]
λ cat ~/Hyperdrive/Network/Stats/3d70fea669f86ddee99c8caf623d8ab9137086a0c896d92a18753b56b6dbf575/storage.json 
{}

Status output on the sending end (macOS 10.14.16, Node v12.16.0):

λ hyperdrive status
The Hyperdrive daemon is running:

  API Version:             0
  Daemon Version:          1.10.12
  Client Version:          1.12.3
  Schema Version:          1.8.2
  Hyperdrive Version:      10.9.1
  Fuse Native Version:     2.2.1
  Hyperdrive Fuse Version: 1.2.15

  Holepunchable:           false
  Remote Address:          

  Fuse Available:          true
  Fuse Configured:         true

  Uptime:                  0 Days 0 Hours 16 Minutes 24 Seconds

On the receiving end:

λ hyperdrive status
The Hyperdrive daemon is running:

  API Version:             0
  Daemon Version:          1.10.12
  Client Version:          1.12.3
  Schema Version:          1.8.2
  Hyperdrive Version:      10.9.1
  Fuse Native Version:     2.2.1
  Hyperdrive Fuse Version: 1.2.15

  Holepunchable:           true
  Remote Address:          <my home IP address (correct)>:49737

  Fuse Available:          true
  Fuse Configured:         true

  Uptime:                  0 Days 0 Hours 20 Minutes 35 Seconds

Hope this helps :) Super excited to see this coming together!

andrewosh commented 4 years ago

Thanks @lachenmayer, great stuff. I'll go through a few of those and brainstorm changes:

  1. The 4/4 thing is just counting the total number of mirror-folder events, but it looks like some must be being doubly-emitted. I'll take a look at that.
  2. think you're right that if a target directory isn't specified it'd make more sense to make a folder named by the key. When mirroring from the drive, existing contents should currently be preserved, minus the .hyperdrive-key, but nuking that key is reason enough for the subdir.
  3. The fact that storage.json becomes empty after the download command isn't right. Especially since this is all local, it should look the same. This means that the watch logic is probably closing the drive unexpectedly. Something fishy going on for sure. (And I see what you were saying now about your 6ea... drive being a valid local one @okdistribute).

I'll update you both when these are fixed. Thanks for mentioning em!

andrewosh commented 4 years ago

@okdistribute @lachenmayer Published 1.13.0 of hyperdrive-daemon-client which should fix some of the issues you had, but I might need more info to track down @lachenmayer's download bug:

  1. I've renamed download/upload to export/import because I think those are more descriptive. Might as well do any name flip-flopping now!
  2. The stats in import should reflect actual file count now. There will still be issues with the progress bar on the export side, because calculating percentages is trickier there, but it should be accurate enough for starters.
  3. export creates a subdirectory for the drive contents (with the key as its name) by default unless an output directory is passed as an argument.
  4. Added usage docs to the readme. Did this step a bit too late to save you any pain :P

As for the issue @lachenmayer had, which appears to be caused by the drive being garbage-collected (in memory, not in storage) immediately after the import, I was able to replicate something similar but only in --memory-only mode. Are you using that by any chance? If so, the drive's storage was being deleted completely when the drive was closed, which explains a lot.

I just added short-circuiting for GC when in memory-only to avoid this. If you're not using this mode, then I'll have to keep digging -- haven't been able to replicate when using disk storage.

lachenmayer commented 4 years ago

I wasn't using --memory-only, just the bare commands. And sorry, forgot to mention that the receiving end was a different machine on my local network (Ubuntu, also Node v12.16)! I believe the issue was more due to my second machine not finding any peers.

Thanks for making those changes, I'll try this again later :)