storj-archived / sips

Storj Improvement Proposals.
GNU General Public License v3.0
14 stars 11 forks source link

Specification for file system operations on the Bridge #19

Open kaloyan-raev opened 7 years ago

kaloyan-raev commented 7 years ago

The current Bridge model defines a two-level flat structure. At the first level we have a list of buckets. Each bucket can contain a list of files - the second level.

The Filezilla integration, that was recently introduced, makes an attempt to emulate a full file system with tree hierarchy. Buckets are displayed as directories and in each bucket you can have subdirectories nested to an arbitrary level.

The actual implementation in the Filezilla integration does not really matter right now. What matters is that whatever the implementation is, it must be described in a specification, so all other client integrations can implement it the same way, so we have consistent behavior across different clients.

It is also worth discussing if this implementation is a responsibility of the clients at all, or it should be implemented in the Bridge itself.

When thinking about how to introduce file system operations, there are several possible approaches:

  1. Change the internal model of the Bridge from buckets-files to tree hierarchy and expose the file system operations as API.
  2. Keep the current buckets-files model in the Bridge, but emulate file system operations in the Bridge itself. Expose the operations as API.
  3. Don't touch the Bridge. Emulate file system operation in libstorj and expose them as API.
  4. Don't touch the Bridge and libstorj. Provide a specification for clients how to emulate file system operation on the libstorj API. Every client should follow this spec.
  5. Don't do anything. Leave clients to decide if they want to have file system operations and do it in consistent way.

We are currently at point 5. My hope is to move up on the above list as much as possible.

Implementing file system operation entirely on the client side (points 3-5) has the problem that some operations cannot be done in an atomic way. For example, If directories are implementing by prefixing each filename with the directory name, then deleting a directory would require deleting each of its files separately. This may require hundreds and even thousands of API requests from the client to the Bridge. This is not only inefficient, but the full list of API calls may not be completed for various reasons, thus leaving the bridge in an inconsistent state. It would be a similar issue with renaming and moving directories.

So it would be best if we have point 1 or 2 implemented. It would be much more efficient and consistent if the Bridge itself is responsible for the file system operations.

I am also curious, why the buckets-lists model was chosen for the Bridge in first place. Perhaps, this is where the discussion should start.

braydonf commented 7 years ago

Yeah, I agree. There should be more consideration to the bucket and bucket entry model that is currently in place. A flat structure could actually be better, so it would just be files. Creating structures of those files would then be handled by manifest files are similar that would point to other files.