Open apocolipse opened 9 years ago
Hey – I haven't looked at implement gridfs support, since I don't use it anywhere.
I agree that adding support might be useful, and I'd consider a PR. It'd probably be easier to review a strawman PR than try to speculate about the code via a description.
hex-encoding the binary data is probably the way forward to fix the encoding issue, but I'd try to replicate it in a test and then add a bunch of debug prints or thereabouts to understand what's going on.
I'm curious if you've looked into GridFS support, being that gridfs is split across 2 collections, they're consistently named (fs.files, fs.chunks), and the standalone adapter for gridfs file getting (by filename or id), I think it merits its own functionality, rather than just mappign both collections to postgres and trying to do assembly on that side. I did some preliminary testing to see if it could work (using '$gridfs' special as a source to trigger gridfs, and then using orig document to grab gridfs file by id)
I'm currently running into some issues with encoding however, some imports succeeding (large plaintext files, some pdfs) and then failing at one point on others on
my modification of fetch_special_source():
(I also tried various combinations of hex transforms and utf8 encoding, it still ended up eventually giving me that ASCII error, for reference my column type its inserting into is BYTEA)
Also, I had to add db adapter arguments in all methods up from fetch_special_source() in shema.rb to import_collections() in streamer.rb inorder to create the gridfs object instance in fetch_special_source(), this seems bad, recommendation for where to stick it?