vajra77 / IRRHound

A simple package to deal with network resources registered into Internet Routing Registries (RIRs)
Creative Commons Zero v1.0 Universal
14 stars 5 forks source link

Module crashes when new_route.source string has remarks appended to it #3

Open Neiby opened 2 years ago

Neiby commented 2 years ago

Occasionally, new_route.source will have multi-line output in it that includes remarks appended. Here's an example of what I mean:

RIPE
REMARKS:        ****************************
REMARKS:        * THIS OBJECT IS MODIFIED
REMARKS:        * PLEASE NOTE THAT ALL DATA THAT IS GENERALLY REGARDED AS PERSONAL
REMARKS:        * DATA HAS BEEN REMOVED FROM THIS OBJECT.
REMARKS:        * TO VIEW THE ORIGINAL OBJECT, PLEASE QUERY THE RIPE DATABASE AT:
REMARKS:        * HTTP://WWW.RIPE.NET/WHOIS
REMARKS:        ****************************

When that happens, we get the following traceback:

Traceback (most recent call last):
  File "irr.py", line 3, in <module>
    routes = irr_hunt_routes(20473, 'AS-CHOOPA', None)
  File "/Users/jneibe/Library/Python/3.8/lib/python/site-packages/irrhound/irrhound.py", line 36, in irr_hunt_routes
    scan.execute()
  File "/Users/jneibe/Library/Python/3.8/lib/python/site-packages/irrhound/shared/irr_scan.py", line 59, in execute
    self._increase_source_weight(new_route.source)
  File "/Users/jneibe/Library/Python/3.8/lib/python/site-packages/irrhound/shared/irr_scan.py", line 83, in _increase_source_weight
    raise Exception("Unrecognized IRR source: {}".format(source))
Exception: Unrecognized IRR source: RIPE
REMARKS:        ****************************
REMARKS:        * THIS OBJECT IS MODIFIED
REMARKS:        * PLEASE NOTE THAT ALL DATA THAT IS GENERALLY REGARDED AS PERSONAL
REMARKS:        * DATA HAS BEEN REMOVED FROM THIS OBJECT.
REMARKS:        * TO VIEW THE ORIGINAL OBJECT, PLEASE QUERY THE RIPE DATABASE AT:
REMARKS:        * HTTP://WWW.RIPE.NET/WHOIS
REMARKS:        ****************************

The exception is in irr_scan.py in a function that checks to see if the source exists in self._source_weight.keys(). I was looking through the code to see where the source info is originally acquired. It appears to be in the expand_as function in whois_proxy.py.

Perhaps one solution is to check to see if there is whitespace in the source string and, if so, split it into a list and then take the first element. I was able to get a working solution using that method directly in irr_scan.py just as a test, but I don't think doing it there was very clean and I just did it as a test to verify that it solved this particular problem.

vajra77 commented 2 years ago

Thanks for letting me know. Can you point me to a query/resource that originally triggers this error? I tried with the one that I see in your trace irr_hunt_routes(20473, 'AS-CHOOPA', None) but it seems it gets completed with no error. I suspect garbage lines may be injected by the ipwhois package, so knowing the exact resource that is not parsed correctly would be of great help.

Neiby commented 2 years ago

That is the query that causes the problem for us, and it happens every time. Very interesting! If it matters, I'm running this from a Macbook Pro, so Mac OSX. I wonder if we're running different versions of the dependencies than you are.

I'm running v1.2.0 of ipwhois and v1.4 of bgpq4.

Neiby commented 2 years ago

We think the problem might be with ipwhois. For example, this is a quick test that always fails for us. And by "fail", I mean that the source always has extraneous information in the string. We used the same dummy network and approach found in whois_proxy.py.

 >>> from ipwhois.net import Net
 >>> from ipwhois.asn import ASNOrigin
 >>> DUMMY_NET = "193.201.40.0"
 >>> mynet = Net(DUMMY_NET)
 >>> obj = ASNOrigin(mynet)
 >>> lookup = obj.lookup('AS137')
 >>> for net in lookup['nets']:
 ...     print(net)
Neiby commented 2 years ago

@vajra77 I think we've shown this problem exists in ipwhois, not your code, so you can probably close out this issue. We haven't figured out root cause, but it seems the extraneous information in the source field is coming from ipwhois, or perhaps even the raw WHOIS data. I'm not sure yet.

Neiby commented 2 years ago

@vajra77 I think we've shown this problem exists in ipwhois, not your code, so you can probably close out this issue. We haven't figured out root cause, but it seems the extraneous information in the source field is coming from ipwhois, or perhaps even the raw WHOIS data. I'm not sure yet.

Neiby commented 2 years ago

If you're interested, I submitted an issue for the ipwhois problem here:

https://github.com/secynic/ipwhois/issues/317

It appears to be a problem with a regular expression.

vajra77 commented 2 years ago

Hi, thanks for submitting the ipwhois issue, I have added a workaround in WhoisProxy.expand_as() to strip away all the remarks strings that follow the source name in the output from ipwhois, tested with records for AS137. Regarding AS20473,AS-CHOOPA I have tested it both on Mac OS X and Ubuntu 20.04, both with bgpq 1.4 and ipwhois 1.2.0.

Neiby commented 2 years ago

Excellent! Thank you so much for your help.