Closed jpallen closed 10 years ago
:+1:
there is this offering from riak, I have never used it http://basho.com/riak-cloud-storage/
another option I know even less about is mongodb's gridfs
Other alternative storage solutions would include: swift <-- minimal untested knowledge implementing similar stuff.
I have had good success with DreamHost DreamObjects, and wrote the Drupal integration for it based on the Drupal Amazon S3 integration (no, I'm not affiliated with DreamHost at all). Since Dreamhost is so much cheaper and Amazon, and it supports the S3 protocols (it is almost as simple as changing the URL), it is a much preferable for me to use DreamHost than Amazon. http://www.dreamhost.com/cloud/dreamobjects/
This should be easy to implement if only the buildDefaultOptions allowed the uri to be customized.
Instead of writing a custom storage backend that writes to disk, why not make the URL configurable and suggest users provide their own S3-comaptible backend? After some digging, I've come up with a few:
I think the simplicity of local file storage makes it a good idea - no config needed, and no other services to add in.
However, it's also an easy change to make the amazon URL configurable, so I think we should do both.
+1 gridfs from mongo would be able to solve this issue and the project depends on mongo already. Integrating it using https://github.com/aheckmann/gridfs-stream should be trivial.
Started having a look at implementing direct filesystem access (using the bucketName as a root path). Currently the S3 unit tests are what are tripping me up the most since they are checking the knox calls are correct.
Same here @cwoac The most difficult task for me is being implement the tests.
@cwoac @alfredosegundo are you adding filesystem writing into the s3 wrapper? Ideally this will be a totally different file which just has functions with the same interface. We could then add another level of abstraction like a "filePersister" (for want of a better name). Then we can do something like below:
if settings.filePersitorType == "s3"
persitor = require("./s3Wrapper")
else
persitor = require("./localFileWrapper")
module.exports = persitor
opinions welcome!
@cwoac don't be afraid to add another value to settings, localFileStoragePath:"/tmp/was/a/bad/idea/"
@henryoswald I coded up a version that used that approach (switch rather than conditional, but hey), but I figured that was a bit nasty and have been trying to do it via a mixin. So you require fsWrapper and that sets its functions to those of the correct wrappers.
I've got that working, although the unit tests bypass it and sinon wrap it directly. I attempted to create a testBackend, but can't see a decent way to override the real config setting from within the module. Actually that gives me an idea...
@cwoac if you want a hand send me a reference to the code and I can comment away.
Also, as a side note, we try to design our code structures so that they are easy to unit test. It's one of the main factors in how ShareLaTeX is coded (all flat files, no classes, simple short functions, etc).
I believe I have switching the backend support working (the one thing I lack is an actual s3 key setup to test with, but the mixin certainly works for test calls. I'll raise a pull request as it is probably the easiest way to discuss the actual code.
Once we've got this sorted, I can start looking in earnest at the filesystem backend
Right, so I've created a branch and built basic unit tests for FSPersistorManager here: 3efb4c8de45d3b05dd09d860089ffd08ac122311
Beyond the fact that the unit tests pass I have done NO testing on this whatsoever and I have no doubt that it will not work.
In particular:
Having said all that, it probably isn't going to require that much work to get it working. I'll have a chance later (today, hopefully) to actually try integrating it into a docker setup I've built for it.
Great work @cwoac! I think it's safe to assume keys can be used as filenames, since they are just the 24 character objectid used by MongoDB. Can @henryoswald confirm? This is mostly his domain atm.
I think using the s3 bucket name settings is a good idea. If we set these up with defaults then users running the FS backend won't even need to know they exist, and we can just have separate directories called somethings like 'user_files' and 'templates'.
What sort of docker set up have you got? Sounds interesting if you're able to share.
Just came here to +1 this. :+1: Will instructions on how to run be posted here? I'm eager to try it out.
yeah the mongo id's are safe to use, its nice to use them also for debugging.
I'm running redis and mongodb both in their own (rather simple) containers, mapping the db directories to a dir on the host and sharing the port with the container that will run sharelatex - not got that up and running yet, I dived straight into this instead.
Having taken a bit of a look at the master configuration setup now, I reckon I can split each module off into separately running docker instances. I'm not 100% sure it would gain me much on deployment, but would certainly help spin-up/down for development.
If we make the settings universal, probably an idea to rename them from s3. I'll shuffle them around to make them more sensible at some point.
So, it seems the keys can't be used directly as filenames - I'm not sure if it is bucketing them or just they have a slash in the name.
[2014-02-27T18:27:43.302Z] INFO: filestore/23013 on albion: reciving request to get file (key=530f2407e7ef165704000007/530f838b46d9a9e859000008, bucket=/tmp/storelatex)
[2014-02-27T18:27:43.303Z] INFO: filestore/23013 on albion: getting file (bucket=/tmp/storelatex, key=530f2407e7ef165704000007/530f838b46d9a9e859000008, opts={})
yeah keys will have a slash in them as they are project_id/file_id, you can do:
key = key.replace(/\//g,"-")
Yeah, I've already pulled it out into a helper function. Shockingly, apart from that, it appears to work, at least on initial inspection. More testing required before pull though.
This is now possible thanks to the great work by @cwoac. Local file storage is the default now so that ShareLaTeX should just run out of the box, with no external config settings needed.
I've just tested, the config is easy and it works. Thanks!
All s3 logic is contained in
app/coffee/s3wrapper.coffee
, so it should be possible to swap out this module with another module with the same API that instead reads and writes from disk.