overleaf / filestore

An API for CRUD operations on binary files stored in S3
GNU Affero General Public License v3.0
26 stars 28 forks source link

Allow local hosting of files (rather than S3) #3

Closed jpallen closed 10 years ago

jpallen commented 10 years ago

All s3 logic is contained in app/coffee/s3wrapper.coffee, so it should be possible to swap out this module with another module with the same API that instead reads and writes from disk.

adarshaj commented 10 years ago

:+1:

henryoswald commented 10 years ago

there is this offering from riak, I have never used it http://basho.com/riak-cloud-storage/

another option I know even less about is mongodb's gridfs

thebookworm101 commented 10 years ago

Other alternative storage solutions would include: swift <-- minimal untested knowledge implementing similar stuff.

coreyp1 commented 10 years ago

I have had good success with DreamHost DreamObjects, and wrote the Drupal integration for it based on the Drupal Amazon S3 integration (no, I'm not affiliated with DreamHost at all). Since Dreamhost is so much cheaper and Amazon, and it supports the S3 protocols (it is almost as simple as changing the URL), it is a much preferable for me to use DreamHost than Amazon. http://www.dreamhost.com/cloud/dreamobjects/

This should be easy to implement if only the buildDefaultOptions allowed the uri to be customized.

hexylena commented 10 years ago

Instead of writing a custom storage backend that writes to disk, why not make the URL configurable and suggest users provide their own S3-comaptible backend? After some digging, I've come up with a few:

jpallen commented 10 years ago

I think the simplicity of local file storage makes it a good idea - no config needed, and no other services to add in.

However, it's also an easy change to make the amazon URL configurable, so I think we should do both.

tiagoboldt commented 10 years ago

+1 gridfs from mongo would be able to solve this issue and the project depends on mongo already. Integrating it using https://github.com/aheckmann/gridfs-stream should be trivial.

cwoac commented 10 years ago

Started having a look at implementing direct filesystem access (using the bucketName as a root path). Currently the S3 unit tests are what are tripping me up the most since they are checking the knox calls are correct.

alfredosegundo commented 10 years ago

Same here @cwoac The most difficult task for me is being implement the tests.

henryoswald commented 10 years ago

@cwoac @alfredosegundo are you adding filesystem writing into the s3 wrapper? Ideally this will be a totally different file which just has functions with the same interface. We could then add another level of abstraction like a "filePersister" (for want of a better name). Then we can do something like below:

if settings.filePersitorType == "s3"
    persitor = require("./s3Wrapper")
else
    persitor = require("./localFileWrapper")

module.exports = persitor

opinions welcome!

@cwoac don't be afraid to add another value to settings, localFileStoragePath:"/tmp/was/a/bad/idea/"

cwoac commented 10 years ago

@henryoswald I coded up a version that used that approach (switch rather than conditional, but hey), but I figured that was a bit nasty and have been trying to do it via a mixin. So you require fsWrapper and that sets its functions to those of the correct wrappers.

I've got that working, although the unit tests bypass it and sinon wrap it directly. I attempted to create a testBackend, but can't see a decent way to override the real config setting from within the module. Actually that gives me an idea...

henryoswald commented 10 years ago

@cwoac if you want a hand send me a reference to the code and I can comment away.

jpallen commented 10 years ago

Also, as a side note, we try to design our code structures so that they are easy to unit test. It's one of the main factors in how ShareLaTeX is coded (all flat files, no classes, simple short functions, etc).

cwoac commented 10 years ago

I believe I have switching the backend support working (the one thing I lack is an actual s3 key setup to test with, but the mixin certainly works for test calls. I'll raise a pull request as it is probably the easiest way to discuss the actual code.

Once we've got this sorted, I can start looking in earnest at the filesystem backend

cwoac commented 10 years ago

Right, so I've created a branch and built basic unit tests for FSPersistorManager here: 3efb4c8de45d3b05dd09d860089ffd08ac122311

Beyond the fact that the unit tests pass I have done NO testing on this whatsoever and I have no doubt that it will not work.

In particular:

  1. I assume that keys can always be used as filenames (i.e. no subdirectories in the key names).
  2. I've yet to look at who/how keys are generated.
  3. It uses the s3 settings for bucket names to give the directories where it will store data (I could move that into a setting and ignore the first parameter passed to all the functions, but this seemed nasty, plus it would remove the option to store user and template files in separate locations).
  4. no checking that the target directory exists and is writable, etc.

Having said all that, it probably isn't going to require that much work to get it working. I'll have a chance later (today, hopefully) to actually try integrating it into a docker setup I've built for it.

jpallen commented 10 years ago

Great work @cwoac! I think it's safe to assume keys can be used as filenames, since they are just the 24 character objectid used by MongoDB. Can @henryoswald confirm? This is mostly his domain atm.

I think using the s3 bucket name settings is a good idea. If we set these up with defaults then users running the FS backend won't even need to know they exist, and we can just have separate directories called somethings like 'user_files' and 'templates'.

What sort of docker set up have you got? Sounds interesting if you're able to share.

pablomendes commented 10 years ago

Just came here to +1 this. :+1: Will instructions on how to run be posted here? I'm eager to try it out.

henryoswald commented 10 years ago

yeah the mongo id's are safe to use, its nice to use them also for debugging.

cwoac commented 10 years ago

I'm running redis and mongodb both in their own (rather simple) containers, mapping the db directories to a dir on the host and sharing the port with the container that will run sharelatex - not got that up and running yet, I dived straight into this instead.

Having taken a bit of a look at the master configuration setup now, I reckon I can split each module off into separately running docker instances. I'm not 100% sure it would gain me much on deployment, but would certainly help spin-up/down for development.

If we make the settings universal, probably an idea to rename them from s3. I'll shuffle them around to make them more sensible at some point.

cwoac commented 10 years ago

So, it seems the keys can't be used directly as filenames - I'm not sure if it is bucketing them or just they have a slash in the name.

[2014-02-27T18:27:43.302Z]  INFO: filestore/23013 on albion: reciving request to get file (key=530f2407e7ef165704000007/530f838b46d9a9e859000008, bucket=/tmp/storelatex)
[2014-02-27T18:27:43.303Z]  INFO: filestore/23013 on albion: getting file (bucket=/tmp/storelatex, key=530f2407e7ef165704000007/530f838b46d9a9e859000008, opts={})
henryoswald commented 10 years ago

yeah keys will have a slash in them as they are project_id/file_id, you can do:

  key =  key.replace(/\//g,"-")
cwoac commented 10 years ago

Yeah, I've already pulled it out into a helper function. Shockingly, apart from that, it appears to work, at least on initial inspection. More testing required before pull though.

jpallen commented 10 years ago

This is now possible thanks to the great work by @cwoac. Local file storage is the default now so that ShareLaTeX should just run out of the box, with no external config settings needed.

MaximeCheramy commented 10 years ago

I've just tested, the config is easy and it works. Thanks!