mafintosh / torrent-stream

The low level streaming torrent engine that peerflix uses
MIT License
1.95k stars 226 forks source link

Accurate engine.remove() #35

Closed BastienClement closed 10 years ago

BastienClement commented 10 years ago

Commit 4769bbf6e96e105ec78fedff53ae1609a7a06003 intended to allow sharing the buffer directory between multiple torrent-stream instances.

On the name conflict side, for both torrent cache and data, this should be OK. However the cleanup logic in engine.remove() is very simple and doesn't bother to check if the removed files are related to the torrent or not.

By moving this to the storage module, we can now be a bit smarter and delete only files related to our torrent. Note: torrent cache file itself is still unliked from index.js since it is created there.

Also, the function does no longer attempt to delete the temporary folder itself. The idea behind this is that you can now use any directory as the buffer folder, like cwd, if you want to download torrent data in it (and then keep it). Maybe an option for this?

Finally, I'm not sure if torrents are guaranteed to have at most one top-level directory. If this is the case, handling multiple roots is not required and the code can be simplified.

In other words, you can have:

single.file

or

a
|_ b
|  |_ ...
|_ c
   |_ ...

but not

b
|_ ...
c
|_ ...

without a common ancestor, in one torrent.

asapach commented 10 years ago

@galedric, do you think it would make sense to write some tests to make sure it works as intended?

BastienClement commented 10 years ago

I'm not familiar with tap, but can take a look at it.

asapach commented 10 years ago

Come to think of it, something tells me that sharing the files between different instances of torrent-stream is not a good idea in the first place; it's a disaster waiting to happen. How can you guarantee that they will not corrupt each other?

BastienClement commented 10 years ago

By sharing, I mean the temporary folder (by default /tmp/torrent-stream), between multiple instances of torrent-stream downloading/streaming distinct torrents.

Previously the torrent cache file was invariantly named cache.torrent and buffer files 00...001 and so on, making this outright impossible. Now the torrent cache file uses the infoHash in its name and buffer data mirrors the torrent structure. So it should be possible to download two different torrents in the same folder without trouble.

Of course two concurrent instances of the same torrent is probably a very very bad idea, but nothing in torrent-stream handle this case at the moment.

asapach commented 10 years ago

I have no problems with the current implementation, because it puts each torrent into a separate folder by default, e.g. /tmp/torrent-stream/{infoHash}, so there are no name collisions. I think your problem is that you're using options.path parameter to stash all the torrents together. Try using options.tmp instead. Unfortunately it's currently undocumented.

mafintosh commented 10 years ago

Whilst not being particular useful torrent-stream should be able to work with two instances of the same torrent in the same folder. Everything that is written have been verified through a checksum first so corruption shouldn't happen.

@galedric This looks good. Would it make sense to store the torrent files in a different folder as the content? Maybe something like opts.tmp + '/.torrents/{infoHash}.torrent'?

In general I'm starting to think that it might make sense to put all of the storage logic into a seperate well tested module.

mafintosh commented 10 years ago

@asapach Send me a PR with opts.tmp documented and I'll happily merge it :)

asapach commented 10 years ago

@mafintosh, coming right up.

mafintosh commented 10 years ago

If we put it into a module we could even start supporting different storage solutions like a mem-storage etc.

BastienClement commented 10 years ago

Doing the opts.tmp + '/.torrents/{infoHash}.torrent' right now. Good idea.

@asapach The rewrite of the remove() function is about preventing files not related to the torrent to get deleted unexpectedly. Sharing the folder is a side-effect of that.

asapach commented 10 years ago

@mafintosh, mem-storage sounds wrong. People who ask for it don't realize that modern operating systems are pretty good at caching disk access. Moreover, JavaScript is not very good at memory management.

BastienClement commented 10 years ago

As it is, storage implementation can be modularized without trouble. All things related to reading, writing, and deleting data on disk is in the same place.

Maybe opts.path should be a more generic opts.storeOptions if multiple storage back-ends should be supported. Along with a opts.store option.

BastienClement commented 10 years ago

What about my question about folder structure in torrents? Should I rework store.remove() with the assumption that at most one top-level directory can exists?

This would mean we could rimram directly by just looking at the first file path.

mafintosh commented 10 years ago

@galedric its either a single file or a single top level folder

BastienClement commented 10 years ago

All right.

One more thing: current default path for torrent file is /tmp/.torrents/{infoHash}.torrent while default path for data if /tmp/torrent-stream/{infoHash}/.

opts.tmp affect the /tmp part of both path, so it become /whatever/.torrents/{...}.torrent and /whatever/torrent-stream/{...}/

opts.path would override the whole /tmp/torrent-stream/{infoHash}/ path, but leave torrent cache files in the same place.

Is this the intended behaviour?

asapach commented 10 years ago

Maybe change /whatever/.torrents/ to /whatever/torrent-stream/.torrents/ for consistency?

BastienClement commented 10 years ago

In fact, simply returning is for sure a bad thing.

In the original code, no error checking is done but ontorrent is always called. After all, it's not a big deal if the torrent file doesn't get written, it would only slow down a future restart from magnet link since data should be fetched again.

BastienClement commented 10 years ago

This last commit should fix the last issue about this PR for me.

When engine.remove() is called, it first removes the {infoHash}.torrent file.

Then if keepStorage is falsy, we call store.remove() that will remove every files created from torrent data. If the opts.path folder is empty, it is also removed.

Then back to engine where it finally attempts to remove the .torrents folder if not empty (no other torrent-stream instance active) and finally the /tmp/torrent-stream/ folder that should also be empty if everything before was removed properly.

In any case, if a folder at one point is not empty it is left behind. (Something like opts.path set to a common download directory, or another torrent-stream using the same opts.tmp folder)

mafintosh commented 10 years ago

Looks good. Thanks.

mafintosh commented 10 years ago

I've disabled the cleaning of empty folders since there was a risk of deleting something like your Downloads folder if you choose that as your path (which seemed like a bug). This result of this is that we leak a little bit of data in /tmp which is probably fine for now. Other than that this is published as 0.9.0

asapach commented 10 years ago

Now it leaves the empty /tmp/torrent-stream/{infoHash} folder and the torrent file in /tmp/torrent-stream/.torrents/. Overall, I liked 0.8.0 better :disappointed:

BastienClement commented 10 years ago

You probably have removed too much code.

You're right about opts.path that can be Downloads folder. I assumed it would not be empty thus not deleted, when the default one would be. But if leaking folders in /tmp is OK for you it's probably better like that.

However, you are no longer deleting the cache torrent file: https://github.com/mafintosh/torrent-stream/blob/a29ce5058af6fa93988b85795a7a04e7b191fd19/index.js#L637

Also the {tmp}/torrent-stream and {tmp}/torrent-stream/.torrents are always created as subfolder inside opts.tmp so it's should be safe to remove them.

This code only removes {tmp}/torrent-stream/.torrent (if empty) and then {tmp}/torrent-stream but dont touch the opts.tmp folder: https://github.com/mafintosh/torrent-stream/blob/a29ce5058af6fa93988b85795a7a04e7b191fd19/index.js#L622

The only one dangerous is this one: https://github.com/mafintosh/torrent-stream/blob/a29ce5058af6fa93988b85795a7a04e7b191fd19/storage.js#L117

BastienClement commented 10 years ago

Maybe something to check if opts.path was the default one, that we can delete, or a user-supplied one.

mafintosh commented 10 years ago

Oops yeah - Fixing that now.

asapach commented 10 years ago

@mafintosh, you are bumping versions too aggressively :smirk:. That should have been 0.9.1, not 0.10.0

mafintosh commented 10 years ago

@asapach 0.9.0 didn't have your [keep-pieces] flag :) Btw, your changes have been released in peerflix 0.9.0 as well. Thanks for your contributions.