vsivsi / meteor-file-collection

Extends Meteor Collections to handle file data using MongoDB gridFS.
http://atmospherejs.com/vsivsi/file-collection
Other
159 stars 37 forks source link

Working on multiple instances? #67

Open dnish opened 8 years ago

dnish commented 8 years ago

Hey, we are using "CollectionFS" at the moment and having big issues with server crashes because we are running multiple Meteor instances to scale our application. Is File-Collection ready for multiple instances or will it have the same problems like CollectionFS?

vsivsi commented 8 years ago

File-collection is much simpler than CollectionFS, as it is built directly on top of gridFS. You don't mention what kinds of crashes you are encountering, or even if you are using CollectionFS with gridFS as the backing store. If you are running into concurrency issues (gridFS is not concurrency friendly out of the box) then you may be in luck, because file-collection uses a gridFS locking package to ensure robust concurrent access. Not knowing anything about your app, I don't know if that may be part of the issues you are seeing or not. Basically it's impossible for me to answer your question because I don't know what your application does, how it works or how you've engineered the scalability/multiple instance support internally (or how you have your mongo server(s) configured...). What I can say is that in its simplicity (assuming it meets your other needs) file-collection reduces the number of variables you need to consider, and explicitly anticipates concurrent access issues that CollectionFS doesn't (yet).

But the only way for you to definitively answer this question is to try it with your app in your environment and beat on it to see what happens.

To that end, you may find this discussion helpful: https://github.com/vsivsi/meteor-file-collection/issues/62

dnish commented 8 years ago

Hey, thank you for your answer. I've just asked because I thought that maybe someone already has a productional app and is running multiple instances with file-collection. We use meteor-cluster for our productional environment and actually running 3 servers (instances) with a MongoDB replica set.

The CollectionFS problem is this one https://github.com/CollectionFS/Meteor-CollectionFS/issues/731. We upload profile images to our app and store them in our GridFS collection. I've just seen that some guy did a "file-queue-worker" plugin for CollectionFS to make it work with multiple instances, but it isn't officially supported.

If I understand it right, CollectionFS works with observers to maintain the uploads, and that's the problem with multiple instances. So I just want to know if File-Collection uses another architecture (just saw your package job-collection) to maintain the uploading process - but as you already said, the best way will be to test it in our environment.

vsivsi commented 8 years ago

I've never seen those types of errors using file-collection. It uses a different upload architecture from collectionFS. If I recall correctly, collectionFS streams uploaded data (chunks) into the local file system, and then when the upload is complete, it copies (streams) the chunks into a single gridFS file in the correct order.

file-collection works in one of two ways. If you use the built-in resumable.js support, then chunks are uploaded and stored as temporary files within gridFS. A constraint is that the resumable.js chunk size and the gridFS chunk size must be equal. Then, once all chunks have been received, file-collection simply creates a zero length file (under write lock) and re-numbers all of the resumable.js chunks to become the chunks of the whole file. So no copying of any kind happens (although the entire file is read by mongo to calculate the correct new md5 sum).

The other way is if you don't use resumable.js, then the uploaded file streams directly from the http connection socket into the gridFS file (under write lock). The file shows up as zero length until the transfer is complete and the lock is released.

So yeah, file-collection and collectionFS do upload completely differently. I wrote file-collection because collectionFS was too slow and sometimes unstable when dealing with large (multi-GB) files that I was using. I tried working on their gridFS backend, but the lack of locking for concurrency and overall complexity of the package (and its dozens of subpackages) convinced me that it would be easier to start from scratch and build a simpler, gridFS focused package. I hope it works for you!

vsivsi commented 8 years ago

BTW, file-collection has quite a few users, and I've never had a single multiple instance issue reported. So either there aren't any obvious problems, or nobody has tried it (which seems unlikely). I'll leave this issue open. Please report back here with your findings so that I can have a place to send people with similar questions in the future! Thanks.

dnish commented 8 years ago

Sorry to ask you again, but is there an "optimal/integrated" way to upload data uris? Our users select a picture which is converted into a base64 data uri. So we don't use any drag&drop or form elements in our application (Cordova). After some research I would have to do it like this way:

Convert data uri into blob & adding it to a hidden form element: http://stackoverflow.com/questions/4998908/convert-data-uri-to-file-then-append-to-formdata

Then I've to add it via myFiles.resumable.addFile(file);

vsivsi commented 8 years ago

Look in here: https://github.com/vsivsi/meteor-file-collection/issues/29

hluz commented 8 years ago

@dnish, I also had instability issues, a few server crashes due to CollectionFS and the problem of not being able to run multiple server instances. Replaced CFS in production (internal corporate app) and it is now running without a glitch. Faster uploads. Lower server memory usage. If the only thing you used CFS for is to upload files into gridFS, them my opinion it to not look back. And if you give the collection name the same name as the current CFS name (i.e., "cfs_gridfs.name") then you don't even have to worry about your current data. After replacing you can just delete the cfs gridfs collections not required by file-collection.

Many thanks to Vaughn.

dnish commented 8 years ago

Thank you for your answers. Was Cordova support merged to the main package (@riaan53)? My files are all empty when I try to upload with a Cordova app. This is the url he calls when starting upload:

http://meteor.local/gridfs/sentfiles2/_resumable?resumableChunkNumber=1&resumableChunkSize=2096128&resumableCurrentChunkSize=33080&resumableTotalSize=33080&resumableType=image%2Fjpeg&resumableIdentifier=dcd1e1279320899e1801f307&resumableFilename=sentImage.jpg&resumableRelativePath=sentImage.jpg&resumableTotalChunks=1

It responses with the whole Meteor head/body HTML document.

vsivsi commented 8 years ago

This may be related... https://github.com/23/resumable.js/issues/252

dnish commented 8 years ago

I'll tomorrow take a look into the source code any try to implement some Cordova settings. Will send you a PR if it works without problems.

dnish commented 8 years ago

At the moment I'm facing the following problem, if I change my base url to this:

resumable_client.coffee

url = "#{@baseURL}/_resumable"
url = Meteor.absoluteUrl(url) if Meteor.isCordova
r = new Resumable
 target: url
 ...

the uploaded file get's added to my "sentfiles2" collection. But I get a 404 at this link:

Request URL: http://192.168.178.34:3000/gridfs/sentfiles2/_resumable?resumableChunkNumber=1&resumableChunkSize=2096128&resumableCurrentChunkSize=42448&resumableTotalSize=42448&resumableType=image%2Fjpeg&resumableIdentifier=5a0a5fc5b8acb9987c45dda4&resumableFilename=sentImage.jpg&resumableRelativePath=sentImage.jpg&resumableTotalChunks=1

He throws the 404 in http_access_server.coffee at 289:

# Perform the collection query
                  req.gridFS = @findOne lookup
                  unless req.gridFS
                     res.writeHead(404, {'Content-Type':'text/plain','Access-Control-Allow-Origin': 'http://meteor.local'})

                     res.end()
                     return

At this time, lookup has the value {md5:'_resumable'}. Shouldn't it have the _id value of the uploaded document?

This is my client side code:

  SentFiles2.resumable.on('fileAdded', function (file) {

            SentFiles2.insert({
                    _id: file.uniqueIdentifier,  // This is the ID resumable will use
                    filename: file.fileName,
                    contentType: file.file.type
                },
                function (err, _id) {  // Callback to .insert
                    if (err) { return console.error("File creation failed!", err); }

                    window.setTimeout(function() {
                        SentFiles2.resumable.upload(); //Files is created, timeout was for testing sync problems
                    },10000);
                }
            );
        });
vsivsi commented 8 years ago

I think you almost have it. You need to add the suffix /_resumable to your target setting. Like is done here: https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_client.coffee#L29

Without that, the resumable chunk requests are being routed to the wrong request handlers.

dnish commented 8 years ago

@vsivsi Uff...sorry, I forgot to post the value of "url" (now edited). It's already added, but gives me that 404 error.

vsivsi commented 8 years ago

In fact, if you are configuring your own Resumable object, it needs to match all of the options as the one being declared at the link above...

dnish commented 8 years ago

It's the same object like the original one, except that I've added the "target" parameter to it. The only changes I did so far are adding the access headers and setting the target url.

vsivsi commented 8 years ago

And you're setting resumable: true on the server-side file collection?

dnish commented 8 years ago

Yep, this are my collection settings:

SentFiles2 = new FileCollection('sentfiles2',
    { resumable: true,   // Enable built-in resumable.js upload support
        baseURL:"/gridfs/sentfiles2",
        http: [
            { method: 'get',
                path: '/:md5',  // this will be at route "/gridfs/myFiles/:md5"
                lookup: function (params, query) {  // uses express style url params
                    return { md5: params.md5 };       // a query mapping url to myFiles
                }
            }
        ]
    }
);
vsivsi commented 8 years ago

What is the value of urlthat you are using to set the target in your Resumable object?

dnish commented 8 years ago

It's the following:

url = "#{@baseURL}/_resumable"
url = Meteor.absoluteUrl(url) if Meteor.isCordova

So I get http://192.168.178.34:3000/gridfs/sentfiles2/_resumable as target url on my Cordova device (it's linked to my dev version on the Mac).

vsivsi commented 8 years ago

It should be http://192.168.178.34:3000/gridfs/sentfiles2/_resumable

The problem you are having is that the resumable requests are being routed to the md5 handler you are declaring rather than matching the resumable route defined here:

https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_server.coffee#L242

And added to the route list (before any user defined routes) here: https://github.com/vsivsi/meteor-file-collection/blob/master/src/http_access_server.coffee#L369

So either the client side route that is defined for resumable is still wrong somehow. Or the server side route is getting screwed up somehow, but it doesn't sound like you've made any server changes...

dnish commented 8 years ago

Argh sorry, but yeah I missed the _resumable part here, so it is in the target url. I've just seen that he adds this to my sentfiles2 collection, when I start uploading a file:

{
    "_id": ObjectId('c36c874a4afa0f8e40952d59'),
    "length": 0,
    "md5": "d41d8cd98f00b204e9800998ecf8427e",
    "uploadDate": ISODate('2015-11-03T17:44:27.866Z'),
    "chunkSize": 2096128,
    "filename": "sentImage.jpg",
    "metadata": {},
    "aliases": [],
    "contentType": "image/jpeg"
}

The interesting here is, that all of my files have length 0 and all the same md5 value (but they are different images). Every file has a different ResumableFile object like this (every file has another size, so I'm wondering why md5 and length all the same in my collection)

{
  size: 25689,
  file: Blob,
  fileName: 'sentImage.jpeg',
  isUploading: function(),
  [...]
}
vsivsi commented 8 years ago

When you insert the file into the collection on the client side, it is created as a zero length file on the server, pending the completion of the resumable chunk uploads. All zero length files have the same md5 sum. When the resumable chunks are uploaded, they are placed in temporary gridFS files, and once they all arrive, the chunks are renumbered and assigned to the real file and the md5 sum is updated.

The issue is that the chunk uploads are failing because they aren't ending up on the proper route server side.

vsivsi commented 8 years ago

Just to be clear, the file document you are seeing is coming from the SentFiles2.insert(...) not from the resumable upload. This is correct behavior.

vsivsi commented 8 years ago

When your resumable chunks are being successfully uploaded, you will see files named _Resumable_<a bunch of numbers> in your collection until the upload of all chunks completes. You aren't getting that far yet, it seems.

dnish commented 8 years ago

I've just logged the file object of SentFiles2.resumable.on('fileAdded', function (file) { ... });

After some correction, my 404 status code is now changed to 204

Remote Address:192.168.178.34:3000
Request URL:http://192.168.178.34:3000/gridfs/sentfiles2/_resumable?resumableChunkNumber=1&resumableChunkSize=2096128&resumableCurrentChunkSize=42448&resumableTotalSize=42448&resumableType=image%2Fjpeg&resumableIdentifier=c41cca8fdd527858d74a26b8&resumableFilename=sentImage.jpg&resumableRelativePath=sentImage.jpg&resumableTotalChunks=1
Request Method:HEAD
Status Code:204 No Content

After this request, I get 10 requests with a 404 error on

Remote Address:192.168.178.34:3000
Request URL:http://192.168.178.34:3000/gridfs/sentfiles2/_resumable
Request Method:OPTIONS
Status Code:404 Not Found

It's in http_access_server.coffee (line 320):

     @router.route('/*')
         .all (req, res, next) ->  # Make sure a file has been selected by some rule
            unless req.gridFS
               res.writeHead(404, {'Content-Type':'text/plain','Access-Control-Allow-Origin': 'http://meteor.local'})
               res.end()
               return
            next()

I'll check the req object tomorrow.

vsivsi commented 8 years ago

The 204 responses are correct! That is the resumable client asking the server if it has that chunk yet, and the server responding: No.

vsivsi commented 8 years ago

https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_server.coffee#L237

vsivsi commented 8 years ago

I don't know anything about that OPTIONS request. Never seen it before from resumable. Is that related to CORS support or something? There is no handler for that in my server-side resumable support, so that may be an issue.

dnish commented 8 years ago

Yeah correct, it is related to CORS. Think I've to response with "200" to make it work. The browser checks if the server allows CORS.

In my example, I got the 204 response as I've changed the lookup function

  lookup: function (params, query) {  
                   //Just for a test
                    return { md5: "d41d8cd98f00b204e9800998ecf8427e" };  //this MD5 exists in the collection.
                }
riaan53 commented 8 years ago

Hi there,

Sorry been busy with lots of other things and havnt got to the pr for cordova support. Will try to find it, its not ready for a merge and still on a older code base.

Just remember that doing this in js on cordova is very cpu intensive for the mobile device. But you can write a plugin or modify the Cordova file transfer one to do it natively that works with this package if you want later.

Regards, Riaan

dnish commented 8 years ago

Hi @riaan53, thank you for the information. Now my upload is working via Cordova, but I've to fix the lookup function (it uses still the default md5 as parameter), "params.md5" is wrong during the upload, but correct if I do a GET request to the file (f.e. http://localhost:3000/gridfs/sentfiles2/e9f98315845c0d955942d2a7a8e5cdac). Will check that tomorrow.

//Edit: Okay, upload is working also with default lookup function, but the HEAD method then throws a 404.

dnish commented 8 years ago

@vsivsi Well, I don't really get it from the examples. So, if I define a new collection like this:

SentFiles2 = new FileCollection('sentfiles2',
    { resumable: true,   // Enable built-in resumable.js upload support

        http: [
            { method: 'get',
                path: '/:_id',  // this will be at route "/gridfs/myFiles/:md5"
                lookup: function (params, query) {  // uses express style url params

                    return { _id: params._id};       // a query mapping url to myFiles
                }
            },

        ]
    }
);

I'll have a route like http://myapp.com/gridfs/sentfiles2/wewe4567ererer45 to get an uploaded image. Okay, that's the part I'm understanding, because the lookup function then "creates" the query with the value of my url parameter.

But now resumable is confusing me, because without any further modification, a typical resumable url looks like this:

http://myapp.com/gridfs/sentfiles2/_resumable?resumableChunkNumber=1&resumableChunkSize=2096128&resumableCurrentChunkSize=42448&resumableTotalSize=42448&resumableType=image%2Fjpeg&resumableIdentifier=04fc0f96127496d6ac78ab54&resumableFilename=sentImage.jpg&resumableRelativePath=sentImage.jpg&resumableTotalChunks=1

In this case, the lookup function will recognize "_resumable" as "_id", throwing a 404 during the upload. If I don't add my server address to the baseUrl url = "#{@baseURL}/_resumable") I don't get a 404, but when I do url = Meteor.absoluteUrl(url) if Meteor.isCordova he throws 404. I guess there is any route registration conflict, as you've already said some posts before.

dnish commented 8 years ago

Just commited my changes in my repo, it works without problems, but I'm limited to document's "_id", because I can't figure out how to get the chosen identifier when I upload via Cordova.

This is the problem line: https://github.com/dnish/meteor-file-collection/commit/b928a996520e10eb9323bbdc4c9703dd025dc28c?diff=unified#diff-3d599543bb10143b21010670d82fc114R288

vsivsi commented 8 years ago

Your lookup function should never be seeing that _resumable URL because it should have already been handled by the built in resumable.js support.

So when you say resumable: true on your new FileCollection() one of the things that is automatically happening is that routes are being added for http://yourserver/gridfs/yourcollection/_resumable/...

And those routes are added before any routes that you specify in your http option.

The code is:

# Set up support for resumable.js if requested
if options.resumable
   options.http = share.resumable_paths.concat options.http

here: https://github.com/vsivsi/meteor-file-collection/blob/master/src/http_access_server.coffee#L369-L370

And the routes defined for resumable are:

   # Setup the GET and POST HTTP REST paths for Resumable.js in express
   share.resumable_paths = [
      {
         method: 'post'
         path: '/_resumable'
         lookup: resumable_post_lookup
         handler: resumable_post_handler
      }
      {
         method: 'get'
         path: '/_resumable'
         lookup: resumable_get_lookup
         handler: resumable_get_handler
      }
   ]

here: https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_server.coffee#L242-L255

So, as best I can tell, something is still wrong on your server that is causing the resumable GET and POST routes above to either not be used, or to be used after any routes you are defining.

Keeping in mind that path: '/:_id' is going to match everything, and so needs to be the last route in the list. If you instead define it as: path: '/_id/:_id' then it will only match paths like: http://yourserver/gridfs/yourcollection/_id/.... But again, none of this should matter because the resumable routes should be checked first.

Your line 288 here: https://github.com/dnish/meteor-file-collection/commit/b928a996520e10eb9323bbdc4c9703dd025dc28c?diff=unified#diff-3d599543bb10143b21010670d82fc114R288 is almost certainly wrong because it is pulling resumable dependent code into the general http handling. All server side resumable specific code is in https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_server.coffee (except for the 2 lines above that add the resumable routes when options.resumable == true

dnish commented 8 years ago

Yeah, but my Cordova app and the server app are using the same code base, and when I link resumable to the server, the server can't find the current uploading file and throws the 404 (because the lookup query is lookup: {_id:null}. This only happens if resumables target is set to the remote server. If I don't change the target url, resumable HEAD responses successfully with 200, but then the whole upload fails.

My changed line is searching for the file id in the request query (in this case it is saved in the resumable part as identifier) and completes the lookup query, resumables HEAD then stops throwing a 404 and responses successfully with 200. I know that this isn't perfect, but at the moment this is the only workaround that works for me. I'm relatively new to Javascript server programming, so I don't understand all parts of your code.

It seems for me a little bit like a state issue, maybe some resumable information is stored locally on the app side and the linked remote server doesn't have access to it?

vsivsi commented 8 years ago

Everything should be in the requests, either in the URL queries for HEAD/GET or in the mime/multipart encoding for the POST. I wish I could just run this, but I have very limited time to spare right now and I have never done any Cordova development, so I don't think there's a quick way for me to actually see what is happening.

vsivsi commented 8 years ago

@dnish I've committed some fixes to the dev branch in preparation for releasing v1.2.1 to fix these resumable.js HEAD/GET request issues. You should try out the changes in your fork and let me know how it goes. I'll probably try to push this to Atmosphere tomorrow.

vsivsi commented 8 years ago

Everything is passing all of my testing, so I merged the fixes into the master branch.