maxmind / getting-started-with-mmdb

A quick guide to writing and reading from your own MMDB databases.
37 stars 11 forks source link

Custom mmdb file sizes are bigger than geolite though they have less data #9

Closed jonathan-kosgei closed 5 years ago

jonathan-kosgei commented 5 years ago

I'm testing creating internal mmdbs according to the getting started tutorial.

I'm able to successfully create an mmdb with 1M records and read it from Python.

The only problem is the file size of the mmdb is 75Mb, each IP range has a very simple data field attached to it eg.

8.8.8.8/24 => {'attribute': 'mkbcslbbgiferyergedcqgxmxiesmzuefwdvzfxevawudpiofqczwvzngxrcwhhk'},
1.1.1.1/32 => {'attribute': 'niprztmeflfxaaknfljqkyxmfoslyqzpmdgvrfflzldttodkilttaijbzowefwon'}

The attribute value for every network is a 64 character long string. This is test data but the actual data will average the same length.

The problem is I need to add 14M more records, and if 1M records is 75Mb then 15M will possibly be greater than 1Gb.

How comes the geolite database and geoip city databases have a lot more data but are more compact in size?

oschwald commented 5 years ago

The writer deduplicates data that is inserted. Although you don't say it explicitly, it sounds like they data you are inserting is random, which it won't be able to deduplicate. If the data is random, it would take 64 MB to just store your attributes.