richardwilly98 / elasticsearch-river-mongodb

MongoDB River Plugin for ElasticSearch
1.12k stars 215 forks source link

Support for compressed files #153

Open davistan opened 10 years ago

davistan commented 10 years ago

Hi, I've searched so far and there's nothing related to compressed files support for this river.

Will it be a good idea to add in support for compressed files? My idea is to have a metadata tag to identify whether the file stream is of compressed or not.

Thanks.

richardwilly98 commented 10 years ago

@davistan can you please clarify?

Apache Tika already support compressed files [1].

[1] - http://tika.apache.org/1.4/formats.html#Compression_and_packaging_formats

davistan commented 10 years ago

@richardwilly98 thanks for replying. I was having the idea of compressing the files before saving to gridfs to save space. If file is an archive file or non-compressible file, then can ignore it, otherwise to gzip the content without changing the file extension.

richardwilly98 commented 10 years ago

@davistan It is technically possible but I am not to understand the value of this feature. What is your use case scenario?

richardwilly98 commented 10 years ago

@davistan any update?

davistan commented 10 years ago

Hi Richard, sorry for the delay. I'm looking at optimizing storage space in gridfs by compressing the files transparently from user.

Of course I'm looking at alternatives such as dedup file systems etc.

richardwilly98 commented 10 years ago

Compression in Elasticsearch is enabled by default

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html

Sent from my iPad

On Nov 2, 2013, at 9:58 PM, Davis notifications@github.com wrote:

Hi Richard, sorry for the delay. I'm looking at optimizing storage space in gridfs by compressing the files transparently from user.

Of course I'm looking at alternatives such as dedup file systems etc. — Reply to this email directly or view it on GitHub.

davistan commented 10 years ago

Thanks for the link, on ES side is perfect but I'm exploring the optimization on gridfs side. Similarly to what is mentioned in the link, we are looking at option to selectively compress files, but is transparent to the user.

richardwilly98 commented 10 years ago

So it looks like more a question for MongoDB user group [1].

[1] - https://groups.google.com/forum/#!forum/mongodb-user

On Sun, Nov 3, 2013 at 6:23 PM, Davis notifications@github.com wrote:

Thanks for the link, on ES side is perfect but I'm exploring the optimization on gridfs side. Similarly to what is mentioned in the link, we are looking at option to selectively compress files, but is transparent to the user.

— Reply to this email directly or view it on GitHubhttps://github.com/richardwilly98/elasticsearch-river-mongodb/issues/153#issuecomment-27657663 .