secynic / ipwhois

Retrieve and parse whois data for IPv4 and IPv6 addresses
https://ipwhois.readthedocs.io/en/latest
BSD 2-Clause "Simplified" License
555 stars 121 forks source link

Roles missing from objects queried using depth parameter #161

Closed cdubz closed 7 years ago

cdubz commented 7 years ago

Python version: Python 3.4.3 ipwhois Version: 0.14.0

I am attempting to get role information at depth for RDAP queries, but the roles are only ever filled in for the first level. Examples with the entirely random example IP 12.3.4.5 -

Example 1

import ipwhois
from pprint import pprint
ip = ipwhois.IPWhois("12.3.4.5")
result = ip.lookup_rdap()
pprint(result)

(See ex_1_result.txt, roles are included for ATTW. See also ex_1_debug.txt)

Example 2

import ipwhois
from pprint import pprint
ip = ipwhois.IPWhois("12.3.4.5")
result = ip.lookup_rdap(depth=1)
pprint(result)

(See ex_2_result.txt, roles are included for ATTW again, but missing for all other entities. See also ex_2_debug.txt)

Example 3

wget http://rdap.arin.net/registry/ip/12.3.4.5

(see ex_3_result.txt, roles are included for all entities.)

Based on the debug output, it seems that the role information is available in the initial query but is not used when depth == 0 and ignored or overwritten when depth > 0.

I am not sure if this is by design or if I am otherwise misunderstanding the configuration and expected output here. It is interesting that the results of wget http://rdap.arin.net/registry/ip/12.3.4.5, for example, actually include a number of entities and their associated information that is all removed or ignored when depth == 0. This seems to make sense, but why not include the basic information since it is already there?

I haven't dug in to the code at all, but I would be happy to do a PR if there is interest in tweaking this functionality.

secynic commented 7 years ago

Based on the output for example 2 & 3, it does indeed look like a bug. Let me dig into it this week.

depth=0 is working as expected. If the info was already there in the case of ARIN, I still would not include that as other RIRs would require additional queries, and the output would be inconsistent. This also helps save parsing cycles.

Thanks for catching this, and the detailed info.

cdubz commented 7 years ago

Ah, good point re: consistency between RIRs. I didn't think about that.

Thanks for your work and let me know if I can do anything else to help!

secynic commented 7 years ago

I found the problem.

The sub-entities of ATTW are each looked up for more data than what is returned in the initial query. That data won't contain roles, as the entity itself does not reference its referrers.

Need to create a temp key 'roles' here: https://github.com/secynic/ipwhois/blob/master/ipwhois/rdap.py#L765

Add a line removing the temp 'roles' key and setting that value in new_objects[ent]['roles']: https://github.com/secynic/ipwhois/blob/master/ipwhois/rdap.py#L812

secynic commented 7 years ago

Check the dev branch. I haven't had the time to find/test an IP with results that has entities greater than a depth of 1. I will do some more testing tomorrow.

cdubz commented 7 years ago

Dug up few examples. All of the below (and my original example, 12.3.4.5) work as expected from the dev branch with a depth > 0. The sub-queried objects include roles data.

Tried to cover more RIRs but it was tough to find even these extras.

secynic commented 7 years ago

Awesome, thanks for testing. I'll get v0.15.1 pushed to PyPi soon.

secynic commented 7 years ago

Merged in 917b6b5. Waiting for final tests before I push to PyPi. Travis needs to implement load balancing with multiple external IPs. Multiple back to back tests are killing me...