zmap / zannotate

Utility for annotating Internet datasets with contextual metadata (e.g., origin AS, MaxMind GeoIP2, reverse DNS, and WHOIS)
Apache License 2.0
94 stars 20 forks source link

Swap ASN source #18

Closed justinbastress closed 6 years ago

justinbastress commented 6 years ago

Using this:

AS Aggregation
==============

0. Download BGPDump_. If you're on OS X, you can ``brew install bgpdump``.

1. Download RIB file: http://archive.routeviews.org/bgpdata. Choose one
   corresponding with the date of your data.

2. Process::

    bgpdump -m route-views4-rib.20140430.1800 | cut -d'|' -f6,7 | sed 's/|/ /g' | awk '{print $1 " " $(NF)}' | sort | uniq

3. Install PySubnetTree_. Use ``pip``. If you're on Linux, make sure to also
   run ``apt-get install python-dev``, since this library calls into C.

4. Python script for adding ASes. This populates a subnet tree with the routing
   table, and so can map IP strings to ASN. Read into it in order to annotate
   your data.

::

    import SubnetTree
    import sys

    t = SubnetTree.SubnetTree()

    def populate_tree(filename):
        global t
        with open(filename, 'r') as fd:
            for l in fd:
                line = line.strip()
                ip, asn = line.split(' ', 1)
                if '{' in asn:
                    asnset = asn[1:-1]
                    asn = asnset.split(',')[0]
                asn = int(asn)
                t[str(ip)] = asn

    # This only works for strings, not unicode strings
    def lookup(ip):
        global t
        if ip not in t:
            return None
        else:
            return t[ip]

    def main():
        populate_tree(sys.argv[1])

.. _BGPDump: https://bitbucket.org/ripencc/bgpdump/wiki/Home
.. _PySubnetTree: https://pypi.python.org/pypi/pysubnettree

and comparing the results for 141.212.118.65, I was not getting the right ASN back.

After this fix, things seem to come out right, though my sample size is still very small.

justinbastress commented 6 years ago

Re: the TravisCI failure, it seems the BGP library has changed recently; internally we pinned its version to "1b84b8145418398e7e68b81cdce506dcf81eec14", so it works with that, but I get the same issue when I run from HEAD.

justinbastress commented 6 years ago

False alarm