Closed svbatalov closed 3 months ago
Great feedback and thanks for those feature requests @svbatalov !
The data's definitely gonna be available within the MMDB library, will check if it's exposed or not, and we could try to get a PR merged to expose it if not and/or temporarily use a fork.
We can add this data to the mmdbctl metadata
output - is that the ideal place to expose it for you @svbatalov ?
cc @coderholic
@UmanShahzad Yeah, sounds great!
The metadata has been included. Closing issue:
$ mmdbctl metadata ip_geolocation_sample.mmdb
- Binary Format 2.0
- Database Type ipinfo ip_geolocation_sample.mmdb
- IP Version 6
- Record Size 32
- Node Count 2927 (2.86 KB)
- Tree Size 23416 (22.87 KB)
- Data Section Size 10790 (10.54 KB)
- Data Section Start Offset 23432 (22.88 KB)
- Data Section End Offset 34222 (33.42 KB)
- Metadata Section Start Offset 34236 (33.43 KB)
- Description
en ipinfo ip_geolocation_sample.mmdb
- Languages en
- Build Epoch 1722965173
Hey @UmanShahzad.
To make
mmdbctl
even more awesome, it would be great to be able to display some low-level data about an MMDB file, such asThis is helpful, for example, if you want to inspect (with hexdump) the actual data section, or if you want to estimate relative impact of the tree/data sections to file size.
Simple example. Let's say we want to find out whether the actual MMDB writer deduplicates written objects (replaces by pointers) or not. I'll use my MMDB parser to display abovementioned offsets.
$ python3 ./parser.py test.mmdb Namespace(file='test.mmdb', meta=False, data=None, ip=None) Data section offset 1096 (data starts at 1112) # <=== Metadata section offset: 1146 (metadata starts at 1160) Data section size 34 bytes (3.4e-05 MB) # <=== Record size: 32 Node count: 137 Tree size: 1096 (bytes) ip_version: 6 First data record at 153 pointer
Knowing the offset/size, we can inspect specific portion of the file:
$ hd -s 1112 -n 34 test.mmdb 00000458 e1 45 76 61 6c 75 65 e1 43 63 6f 6c 47 6e 65 73 |.Evalue.CcolGnes| 00000468 74 65 64 31 e1 20 01 e1 20 08 47 6e 65 73 74 65 |ted1. .. .Gneste| 00000478 64 32 |d2| 0000047a
So it does deduplicate objects. Looks like it even deduplicates nested objects, which is great.
The point is it is really convenient to know those offsets when doing stuff like this.
Not sure if Go MMDB reader exposes this data, but it should be easy to find section separators (see the specs) even without parsing the file, e.g. by
mmap
-ing the file and using string search functions: https://github.com/svbatalov/construct_mmdb_parser/blob/11b13ef946b7d85cec4e21a538af49b5b44f22a1/parser.py#L13-L19Thanks, Sergey