strands-project / mongodb_store

MongoDB tools for storing and analysing runs of ROS systems.
BSD 3-Clause "New" or "Revised" License
49 stars 73 forks source link

Large size of mongodb_store files on disk #120

Closed RaresAmbrus closed 9 years ago

RaresAmbrus commented 9 years ago

I've been looking at the data logged during the marathon, in particular the metric maps we recorded at KTH. The size of the metric_maps database in mongodb is 33.9 GB, however, when I export the data and save it to the disk the total size is only 17 GB. Could it be because it's mostly point cloud data which I'm saving as binary to the disk, whereas mongodb doesn't do that?

hawesie commented 9 years ago

MongoDB does store that data in binary, but I assume there’s some overhead as well. The easiest way to check you’re not missing anything is to try to reinsert the metric maps and see what size the resulting database is.=

RaresAmbrus commented 9 years ago

Thanks @hawesie I'll try that out and get back with some figures.

RaresAmbrus commented 9 years ago

I ran a small experiment where I started with an empty mongodb database (size 20 KB) and I added some metric maps of total size 5.4 GB. After adding everything to mongodb the size of the database became 17 GB. I then exported the data out of mongodb and saved it to the disk and got a total size of 5.4 GB again. I compared the exported data with the original data and they seem to be the same.

marc-hanheide commented 9 years ago

mongodb allocates HUGE pages and preallocated them. Maybe that's the reason?

RaresAmbrus commented 9 years ago

This is what the files inside the database look like:

drwxr-xr-x 4 root  root  4.0K Nov 17 21:09 ../
drwxrwxr-x 2 rares rares 4.0K Dec 16 19:03 journal/
-rw------- 1 rares rares  64M Dec 16 19:03 metric_maps.0
-rw------- 1 rares rares 128M Dec 16 18:57 metric_maps.1
-rw------- 1 rares rares 2.0G Dec 16 19:03 metric_maps.10
-rw------- 1 rares rares 2.0G Dec 16 19:03 metric_maps.11
-rw------- 1 rares rares 256M Dec 16 18:57 metric_maps.2
-rw------- 1 rares rares 512M Dec 16 19:02 metric_maps.3
-rw------- 1 rares rares 1.0G Dec 16 19:00 metric_maps.4
-rw------- 1 rares rares 2.0G Dec 16 19:00 metric_maps.5
-rw------- 1 rares rares 2.0G Dec 16 19:00 metric_maps.6
-rw------- 1 rares rares 2.0G Dec 16 19:00 metric_maps.7
-rw------- 1 rares rares 2.0G Dec 16 19:02 metric_maps.8
-rw------- 1 rares rares 2.0G Dec 16 19:03 metric_maps.9
-rw------- 1 rares rares  16M Dec 16 19:03 metric_maps.ns
-rwxrwxr-x 1 rares rares    6 Dec 16 18:40 mongod.lock*
drwxrwxr-x 2 rares rares 4.0K Dec 16 19:03 _tmp/

If there's any way to check what's preallocated and what's already filled I can do that. But I think the metric_maps.* files are already filled, or most of them are perhaps except for the last one - metric_maps.11. Either way, the raw data is is 5.4 GB and allocating 17 GB for that seems a bit much.

marc-hanheide commented 9 years ago

hmm...

Did you read this: http://docs.mongodb.org/manual/faq/storage/#why-are-the-files-in-my-data-directory-larger-than-the-data-in-my-database

This http://stackoverflow.com/questions/20087895/why-do-mongodb-takes-up-so-much-space

Also, you can db.stats() (see http://docs.mongodb.org/manual/reference/command/dbStats/) to learn more about your space allocation

RaresAmbrus commented 9 years ago

In the metric_maps database:

> db.stats()
{
    "db" : "metric_maps",
    "collections" : 4,
    "objects" : 1138,
    "avgObjSize" : 9520845.065026361,
    "dataSize" : 10834721684,
    "storageSize" : 11357589504,
    "numExtents" : 20,
    "indexes" : 2,
    "indexSize" : 57232,
    "fileSize" : 17105420288,
    "nsSizeMB" : 16,
    "ok" : 1
}
> 

For the data collection:

> db.data.totalSize()
11354599328
> db.data.dataSize()
10833652544
> 

Size of raw data: 5.4GB Space allocated for data in database: 11357589504 ~ 11 GB Size of database on disk: ~17 GB

hawesie commented 9 years ago

Is this still an open issue? I'm not sure there's much we can do.

marc-hanheide commented 9 years ago

fixed in #153