Migrating from CFS/GridFS

vsivsi / meteor-file-collection

Extends Meteor Collections to handle file data using MongoDB gridFS.

http://atmospherejs.com/vsivsi/file-collection

Other

159 stars 37 forks source link

Migrating from CFS/GridFS #62

Open hluz opened 8 years ago

hluz commented 8 years ago

How hard could migrating from CollectionFS be? Do the existent GridFS collections get reused or are we up for some heavy-duty data transformation?

vsivsi commented 8 years ago

It's been awhile since I've used CollectionFS. The main difference is that CFS stores any/all file metadata in a separate Meteor collection, whereas file-collection stores it directly in the gridFS file documents themselves. Migration shouldn't be too bad, you should be able to run CFS and FC at the same time on the same gridFS store and just copy over what's missing. It probably goes without saying, but Back-up your database before attempting this!

hluz commented 8 years ago

Thanks for the reply. I finally got some time to try to use file-collection with the same gridFS store and the good news is that i don't even have to copy over the metadata, since CollectionFS also stores (duplicated) the metadata in the gridfs file collection.

However, the bad news it that, in my case, when using the same gridFS store, which has a name of 'cfs_gridfs.attachments', I get the following mongo error:

MongoError: namespace name generated from index name "meteor.cfs_gridfs.attachments.files.$metadata._Resumable.resumableIdentifier_1_metadata._Resumable.resumableChunkNumber_1_length_1" is too long (127 byte max)

Any chance of getting the name of the index that file-collection is defining to be shorter?

vsivsi commented 8 years ago

Yup! Look at the resumableIndexName option here:

https://github.com/vsivsi/meteor-file-collection#create-a-new-filecollection-object---server-and-client

vsivsi commented 8 years ago

Also see this issue for history, etc. https://github.com/vsivsi/meteor-file-collection/issues/55

hluz commented 8 years ago

Great! Many thanks. (yes, I know... rtfm... :-( )

hluz commented 8 years ago

Just some feedback: Replaced CollectionFS with file-collection in prod yesterday. Working great so far. Significant memory usage reduction on server. A lot faster to upload than CFS (using resumable.js). And no need for any data conversion ;-)

vsivsi commented 8 years ago

Nice! Glad it worked out.

Shogutora commented 8 years ago

Hello guys, I am trying to accomplish the same, migrating over from CollectionFS. However I am running to dead end on how to actually copy the data over. I've managed to make the file-collection db and copy the "fs.files" part from cfs_gridfs.files by looping the collection, but I cannot copy over the actual files from the chunks. And actually I do not want to copy all of them, as it seems there are some faulty references, which is partly the reason for migrating.

Any help is appreciated greatly!

vsivsi commented 8 years ago

Hi, I don't think you need to copy any data if you don't want to. The file-collection can just point at the gridFS bucket used by CollectionFS. With gridFS you can't just copy over all of the documents in the .files and .chunks collections and call it good, you actually need to use a gridFS driver for that because it invokes some Mongo server-side logic to validate the chunk structure and calculate the me5 sum.

Shogutora commented 8 years ago

Hi, thank you for the quick reply! So it should be enough if I change the collection name for the chunks to the one used by file-collection and duplicate the reference objects in .files? What about the .locks then?

vsivsi commented 8 years ago

file-collection should be able to automatically use any valid existing gridFS "bucket", whether created by CollectionFS or any other source. The .locks collection should be automatically created and indexed if it is not already present.