sbng / mrt2mmdb

Convert a mrt file into a maxmind database file
1 stars 0 forks source link

Feature request: Prometheus output #1

Closed johnhtodd closed 4 months ago

johnhtodd commented 4 months ago

One of the things I’d like to do is get metrics out of the code and you’ve already achieved a bit of that with your realtime reporting of activity. I’d like to get Prometheus output at the end as an option. We’ll just pipe this into a file at the very end of the process. I always want to have measurable results from every bit of code that runs, and this is no exception.

This should not produce any output until the process has completed - only at the very end of the process should the results be sent to stdout, so we don’t end up with partial files if the code crashes mid-way.

Example run (as it looks today): Making ASN table for description lookup : 625093 prefixes [00:06, 103943.50 prefixes/s] Loading mrt data into dictionary : 1132006 prefixes [02:27, 7650.43 prefixes/s] Converting mrt into mmda : 1132000 prefixes [03:11, 5900.44 prefixes/s] Prefixes without description : 10474 prefixes

Here is what I’d hope to see for optional (command line specified) Prometheus-style output of the same process, which we’ll just pipe into a file in the node collection directory so it will automatically be consumed and transmitted to Prometheus with other node report data, in the same style as the RIPE Atlas probe work you did:

(note: the “#” comments are not what I’d expect to see in final output; just notes for you)

root@dev01:~/mrt/mrt2mmdb/mrt2mmdb# python3 make_mmdb.py --mrt ../data/mrt.ams.20240303.gz  --prometheus-log —quiet
#
# durations are seconds
mrt2mmdb_description_asn_prefixes 625093
mrt2mmdb_description_asn_prefixes_duration 6
mrt2mmdb_description_asn_prefixes_per_second 103943.50
#
mrt2mmdb_dictionary_load_prefixes 1132006
mrt2mmdb_dictionary_load_prefixes_duration 147
mrt2mmdb_dictionary_load_prefixes_per_second 7650.43
#
mrt2mmdb_conversions 1132000
mrt2mmdb_conversions_duration 191
mrt2mmdb_conversions_per_second 5900.44
#
# How many prefixes were not found in the Maxmind template file that we’re using as a source for names?
#
mrt2mmdb_prefixes_no_description 10474
#
# When did this instance of the process start? Unix epoch seconds.
#
mrt2mmdb_lastrun_timestamp 1709496368
#
# This is the creationtime of the MRT file that is being parsed
# Unix epoch seconds.  This is what we can use to see if somehow
# our MRT file collection pipeline is “stuck” and not being updated.
#
mrt2mmdb_mrt_file_creation_timestamp 1709495844
#
# This is the creationtime of the template MMDB file that is being parsed
# Unix epoch seconds.  This is what we can use to see if somehow
# our template MMDB file collection pipeline is “stuck” and not being updated.
#
mrt2mmdb_template_mmdb_file_creation_timestamp 1709495844
#
# Keep a version number so we can track behaviors of different variations
# MUST BE NUMERIC ONLY, with a single decimal point.
#
mrt2mmdb_version 1.0
#
# end
sbng commented 4 months ago

@johnhtodd feature added under prometheus branch. commit 14428b4.

Please verify and will merge to main after verification

sbng commented 4 months ago
$ mrt2mmdb --bgpscan --mrt ../mrt2mmdb/data/mrt-dump.ams.202402171710.gz --mmdb ../mrt2mmdb/data/GeoLite2-ASN.mmdb --target bgpscan.mmdb --prometheus
#
# durations are seconds
mrt2mmdb_description_asn_prefixes 625093
mrt2mmdb_description_asn_prefixes_duration 7
mrt2mmdb_description_asn_prefixes_per_second 84850.85
#
mrt2mmdb_dictionary_load_prefixes 1132006
mrt2mmdb_dictionary_load_prefixes_duration 7
mrt2mmdb_dictionary_load_prefixes_per_second 164646.37
#
mrt2mmdb_conversions 1132005
mrt2mmdb_conversions_duration 111
mrt2mmdb_conversions_per_second 10213.22
#
# How many prefixes were not found in the Maxmind template file that we’re using as a source for names?
#
mrt2mmdb_prefixes_no_description 10494
#
# When did this instance of the process start? Unix epoch seconds.
#
mrt2mmdb_lastrun_timestamp 1710045441
#
# This is the creationtime of the MRT file that is being parsed
# Unix epoch seconds.  This is what we can use to see if somehow
# our MRT file collection pipeline is “stuck” and not being updated.
#
mrt2mmdb_mrt_file_creation_timestamp 1710045441
#
# This is the creationtime of the template MMDB file that is being parsed
# Unix epoch seconds.  This is what we can use to see if somehow
# our template MMDB file collection pipeline is “stuck” and not being updated.
#
mrt2mmdb_template_mmdb_file_creation_timestamp 1710239217
#
# Keep a version number so we can track behaviors of different variations
# MUST BE NUMERIC ONLY, with a single decimal point.
#
mrt2mmdb_version 1.0
johnhtodd commented 4 months ago

Amazing - I'll take a look at it later this evening.

johnhtodd commented 4 months ago

Question on the "prefixes not found in Maxmind set" - are you just matching on the origin AS, or are you trying to match on the prefix? Because all you need to do is look at the origin ASN of whatever is in the MRT file, and then find that ASN in the Maxmind file and you're essentially done. The Maxmind file is really only used as a list of ASN->Name mappings for convenience; you can pretty much throw out the prefix part of that file. It seems very odd that there would be 10,000 prefixes announced by ASNs which do not appear anywhere in the Maxmind file... maybe that's true, but I'd have to see the ASNs.

sbng commented 4 months ago

Yes. I throw out the prefix part of the maxmind file. Because originally, I tried to do prefixes lookup on the maxmind geoip db, but there's disparity between the ASN reported by maxmind geoip vs mrt. Since mrt is is source of truth, I reckon it's irrelevant to do maxmind geoip prefix. Hence, I just look at ASN and description only. The 10k missing description is non unique. If I recall, the unique asn missing is about 6k. Meaning about 6k ASN had no description. I can output these 6k ASN if needed. Meanwhile, I try to write another script to figure out the disparity between maxmind geoip asn vs mrt asn.

On Wednesday, March 13, 2024, John Todd @.***> wrote:

Question on the "prefixes not found in Maxmind set" - are you just matching on the origin AS, or are you trying to match on the prefix? Because all you need to do is look at the origin ASN of whatever is in the MRT file, and then find that ASN in the Maxmind file and you're essentially done. The Maxmind file is really only used as a list of ASN->Name mappings for convenience; you can pretty much throw out the prefix part of that file. It seems very odd that there would be 10,000 prefixes announced by ASNs which do not appear anywhere in the Maxmind file... maybe that's true, but I'd have to see the ASNs.

— Reply to this email directly, view it on GitHub https://github.com/sbng/mrt2mmdb/issues/1#issuecomment-1992432049, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB25MFZQ6HBQ3SYI6YVIYPLYX5L7TAVCNFSM6AAAAABEELXKPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJSGQZTEMBUHE . You are receiving this because you were assigned.Message ID: @.***>

johnhtodd commented 4 months ago

OK, you're doing the right thing then.

Yes, it would be VERY interesting to know what ASNs are missing from the Maxmind table. That is really interesting data; perhaps have a "verbose" option that prints out that information with a unique logline of some sort to a logfile? I think it should not be to stdout.

sbng commented 4 months ago

I added an output ASN without description to ensure the information is more complete. I will add a logger to output stats to debug level later.

$ ./make_mmdb.py --bgp --target a.mmdb                                                                                                                [U!prometheus]
 Making ASN table for description lookup   : 625093 prefixes [00:02, 223367.77 prefixes/s]
 Loading mrt data into dictionary          : 1132006 prefixes [00:06, 178456.15 prefixes/s]
 Converting mrt into mmda                  : 1132005 prefixes [01:48, 10414.73 prefixes/s]
 Prefixes without description              : 10494  prefixes
 ASN without description                   : 6317  prefixes
sbng commented 4 months ago

merge into main. logger --log_level debug is added to show the detail of the missing ASN.