mbdavid / LiteDB

LiteDB - A .NET NoSQL Document Store in a single data file
http://www.litedb.org
MIT License
8.52k stars 1.24k forks source link

Compression #298

Closed skdenmark closed 7 years ago

skdenmark commented 8 years ago

Hi

Does LiteDB support any form of compression?

mbdavid commented 8 years ago

Hi @skdenmark, not in current version. Compression data is easy to implement, but cheap to compress/decompress on each read/write in database. Performance will be degraded.

functionsoft commented 7 years ago

Could you add compression and decompression using the GZipStream internally?

See http://www.dotnetcurry.com/ShowArticle.aspx?ID=105

and https://www.dotnetperls.com/compressionlevel

I'm looking for a database to store, read and update financial tick data. The raw uncompressed files are about 20GB each, so having them compressed and stored in an optimum way would be very important.

Since IO would decrease, it might actually increase overall read speed.

Thanks.

mbdavid commented 7 years ago

Hi @functionsoft, the main problem is that not possbiel compress final FileStream because all references are based on positions. The only way is compress BSON serialization before store in disk. But I have some doubt about how efficient is because each document must be compressed separately. Small documents will not have any difference. Can you show a sample of each document you need to store?

functionsoft commented 7 years ago

Hi Mauricio,

Thanks for the fast reply.

A very small sample of the data in CSV format is below. Note there is one unique primary key, that is the time stamp.

I was thinking the BSON document could be compressed.

Timestamp,Bid Price,Ask Price,Bid Volume,Ask Volume 2016.11.14 02:00:00.125,1.08226,1.0823,1,1 2016.11.14 02:00:00.531,1.08224,1.08229,4.11999988555908,1 2016.11.14 02:00:01.048,1.08224,1.0823,2.61999988555908,1.12000000476837 2016.11.14 02:00:01.558,1.08225,1.08231,1,5.84999990463257 2016.11.14 02:00:05.127,1.08226,1.08231,3,3.45000004768372 2016.11.14 02:00:05.655,1.08227,1.08232,2.25,3 2016.11.14 02:00:05.909,1.08228,1.08233,2.25,1 2016.11.14 02:00:06.011,1.08231,1.08234,2.25,1.12000000476837 2016.11.14 02:00:06.315,1.08233,1.08237,2.25,1.5 2016.11.14 02:00:06.821,1.08233,1.08238,4.11999988555908,1 2016.11.14 02:00:07.025,1.08233,1.08235,4.11999988555908,1 2016.11.14 02:00:07.228,1.0823,1.08233,3,1 2016.11.14 02:00:07.734,1.0823,1.08234,2.61999988555908,3 2016.11.14 02:00:09.438,1.08226,1.08227,6.75,1 2016.11.14 02:00:09.641,1.08225,1.08227,1.12000000476837,1 2016.11.14 02:00:09.743,1.08225,1.08229,3,1.5 2016.11.14 02:00:10.321,1.08226,1.08229,3.75,1 2016.11.14 02:00:10.827,1.08225,1.0823,3,2.25 2016.11.14 02:00:12.013,1.08226,1.0823,3.75,1.5 2016.11.14 02:00:12.115,1.08228,1.08232,2.25,3 2016.11.14 02:00:12.418,1.0823,1.08233,1,1 2016.11.14 02:00:13.221,1.08229,1.08233,2.61999988555908,1 2016.11.14 02:00:13.575,1.08231,1.08235,1,4.11999988555908 2016.11.14 02:00:14.769,1.0823,1.08235,3,5.25 2016.11.14 02:00:15.275,1.0823,1.08234,1.12000000476837,1 2016.11.14 02:00:18.343,1.08231,1.08235,1,3.75 2016.11.14 02:00:18.649,1.08233,1.08237,2.61999988555908,2.25 2016.11.14 02:00:19.625,1.08234,1.08237,1,1 2016.11.14 02:00:20.276,1.08234,1.08238,1,3.75 2016.11.14 02:00:21.971,1.08233,1.08238,3,3 2016.11.14 02:00:22.275,1.08236,1.0824,4.11999988555908,1 2016.11.14 02:00:22.550,1.08237,1.08242,2.61999988555908,4.11999988555908 2016.11.14 02:00:22.601,1.08237,1.0824,2.61999988555908,1 2016.11.14 02:00:23.125,1.08236,1.08239,1.12000000476837,1.5 2016.11.14 02:00:25.725,1.08235,1.08239,2.61999988555908,1.5 2016.11.14 02:00:26.182,1.08236,1.0824,3,2.25 2016.11.14 02:00:26.724,1.08236,1.08239,3,1.5 2016.11.14 02:00:31.806,1.08236,1.08238,2.25,1 2016.11.14 02:00:32.111,1.08235,1.08238,2.61999988555908,2.25 2016.11.14 02:00:34.364,1.08236,1.08238,1,3 2016.11.14 02:00:35.880,1.08235,1.08237,3,1 2016.11.14 02:00:36.083,1.08235,1.08238,1,1.5 2016.11.14 02:00:36.433,1.08234,1.08236,2.25,1.5 2016.11.14 02:00:37.053,1.08235,1.08236,1.5,2.25 2016.11.14 02:00:37.559,1.08234,1.08236,1.5,2.25 2016.11.14 02:00:38.122,1.08233,1.08235,3,3 2016.11.14 02:00:39.552,1.08234,1.08235,1,3 2016.11.14 02:00:39.806,1.08234,1.08237,3,2.25 2016.11.14 02:00:41.858,1.08234,1.08236,3,1.5 2016.11.14 02:00:42.314,1.08232,1.08234,1.5,3 2016.11.14 02:00:42.872,1.08231,1.08234,1.12000000476837,3 2016.11.14 02:00:43.374,1.08232,1.08234,1,3 2016.11.14 02:00:43.882,1.0823,1.08234,7.69000005722046,1 2016.11.14 02:00:44.677,1.0823,1.08233,5.25,1 2016.11.14 02:00:45.195,1.0823,1.08234,5.25,2.25 2016.11.14 02:00:48.629,1.0823,1.08233,6,1.5 2016.11.14 02:00:49.149,1.0823,1.08233,3.36999988555908,2.25 2016.11.14 02:00:50.219,1.0823,1.08232,2.61999988555908,1 2016.11.14 02:00:50.675,1.08228,1.0823,2.05999994277954,1 2016.11.14 02:00:51.181,1.08227,1.08229,1.87000000476837,1.5 2016.11.14 02:00:51.536,1.08226,1.08227,1.5,1 2016.11.14 02:00:52.083,1.08226,1.08227,1.5,1.5 2016.11.14 02:00:52.742,1.08225,1.08226,3,1 2016.11.14 02:00:53.147,1.08224,1.08226,1,1 2016.11.14 02:00:53.198,1.08222,1.08224,13.5,1 2016.11.14 02:00:53.330,1.0822,1.08222,1,2.25 2016.11.14 02:00:53.432,1.08218,1.08221,1.12000000476837,1.5 2016.11.14 02:00:53.786,1.08218,1.08219,1,1.5 2016.11.14 02:00:53.937,1.08215,1.08216,1.87000000476837,1.5 2016.11.14 02:00:54.241,1.08217,1.08218,2.25,3.75 2016.11.14 02:00:54.292,1.08215,1.08216,3,1 2016.11.14 02:00:54.798,1.08216,1.08217,1,1 2016.11.14 02:00:55.324,1.08215,1.08217,1.5,1 2016.11.14 02:00:56.185,1.08215,1.08216,1,1 2016.11.14 02:00:56.706,1.08214,1.08215,3.19000005722046,3 2016.11.14 02:00:56.959,1.08214,1.08218,3.94000005722046,2.25 2016.11.14 02:00:57.278,1.08216,1.08217,1,1 2016.11.14 02:00:57.822,1.08215,1.08218,1.75,3

Regards,

Mike

mbdavid commented 7 years ago

Take a look on this results:

var doc = new BsonDocument
{
    { "a", DateTime.Now },
    { "b", 1.08224 },
    { "c", 1.0823 },
    { "d", 2.61999988555908 },
    { "f", 1.12000000476837 }
};

var bson = BsonSerializer.Serialize(doc);

var original = bson.Length; // 60 bytes

using (var memory = new MemoryStream())
{
    using (var gzip = new GZipStream(memory, CompressionMode.Compress))
    {
        gzip.Write(bson, 0, bson.Length);
    }

    var compress = memory.ToArray().Length; // 81 bytes
}

Compression each document is not efficient enouth. To get best results, needs gzip all file.

rsrini83 commented 6 years ago

Hi david,

Thanks for the library, it is very easy to use and support rich features out of the box. However, I'm having the problem with disk space. In my application, i'm creating a single file for a user. Each user stores 500K documents on an average. Each file is consuming around 195 MB. If I compress entire file it is consuming just 21 MB. Is there any possibility to provide compression similar to MongoDB.

--Srinu

ronnieoverby commented 2 years ago

How about using compression features of the file system?

image