timmerk / ipaddr-py

Automatically exported from code.google.com/p/ipaddr-py
0 stars 0 forks source link

collapse_address_list is too slow #17

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Try to collapse a /24's worth of IP addresses one by one.

Below is code used to benchmark
{{{
from ipaddr import IP, CollapseAddrList

import timeit

RUNS = 1000
addresses = [IP(ip) for ip in IP('1.1.1.0/24')]

if __name__ == '__main__':
    t1 = timeit.Timer('CollapseAddrList(addresses)', 'from __main__ import 
CollapseAddrList, addresses')
    print "CollapseAddrList: %s s" % t1.timeit(number=RUNS)
}}}

*Before:*
$ python collapse_timeit.py
CollapseAddrList: 64.273168087 s

*After:*
$ python collapse_timeit.py
collapse: 1.20903706551 s

What is the expected output? What do you see instead?

ipaddr should be faster

What version of the product are you using? On what operating system?

using latest ipaddr trunk version on Ubuntu on 2.6Ghz Core 2 Duo.

Please provide any additional information below.

The patch in http://codereview.appspot.com/67107 adds the following 
functionality derived from my own original IPy based code at 
http://code.google.com/p/wadofstuff/source/browse/trunk/python/ip/wadofstuf
f/ip/summarize.py

1. summarize_address_range() which takes a start and finish IP and 
summarizes that range into networks.
2. Allow IPv4/IPv6 objects to be created using a range notation e.g 
1.1.1.0-1.1.1.7. This is used by summarize_address_range() to shortcut its 
main loop.
3. collapse_address_list will collapse ordered sequences of IP addresses 
and call summarize on them.

Original issue reported on code.google.com by mattimus...@gmail.com on 26 May 2009 at 12:30

GoogleCodeExporter commented 9 years ago
Hi Matt (F.),

Just noticed what I believe is a bug in your patch 
(http://codereview.appspot.com/67107).

Have a look :-

In [1]: from ipaddr import *

In [2]: addresses = [IP(ip) for ip in IP('1.1.1.0/24')]

In [3]: addresses.extend([IP(ip) for ip in IP('::1.1.1.0/120')])

In [4]: len(addresses)
Out[4]: 512

In [5]: addresses[0], addresses[255], addresses[256], addresses[511]
Out[5]: (IPv4('1.1.1.0/32'), IPv4('1.1.1.255/32'), IPv6('::101:100/128'),
IPv6('::101:1ff/128'))

In [6]: CollapseAddrList(addresses)
Out[6]: [IPv4('1.1.1.0/24')]  # incorrect

Should be :-

In [6]: CollapseAddrList(addresses)
Out[6]: [IPv4('1.1.1.0/24'), IPv6('::1.1.1.0/120')]

Regards,

Dave M.

Original comment by drk...@gmail.com on 26 May 2009 at 1:45

GoogleCodeExporter commented 9 years ago
Hi Dave,

Thanks for trying it out. I interpreted the docstring in 
collapse_address_list():

"addresses: A list of IPv4 or IPv6 objects."

as meaning 'addresses' should contain only IPv4 or only IPv6 addresses, not IPv4
*and* IPv6 addresses.

I've noticed that there isn't an existing test case for this either.

Can someone clarify the intent here?
Is it a valid use case?

regards

Matthew Flanagan

Original comment by mattimus...@gmail.com on 26 May 2009 at 8:13

GoogleCodeExporter commented 9 years ago
The new Patch Set 5 at http://codereview.appspot.com/67107 now works against 
2.0.x 
branch and is up to 2 times faster than my previous versions.

Original comment by mattimus...@gmail.com on 19 Aug 2009 at 12:41

GoogleCodeExporter commented 9 years ago
hey Matt,

I thought your original patch sped up the execution of collapse_address_list as 
well, 
no?

the ipaddr-py2 directory has your patch applied, ipaddr-py doesn't:

pmoody@pmoody - (0) - 11:41 - ~/src
-> (cd ipaddr-py; python -m timeit 'import ipaddr; 
ipaddr.collapse_address_list([ipaddr.IP(x) for x in ipaddr.IP("1.1.1.0/24")])')
10 loops, best of 3: 121 msec per loop

pmoody@pmoody - (0) - 11:41 - ~/src
-> (cd ipaddr-py2; python -m timeit 'import ipaddr; 
ipaddr.collapse_address_list([ipaddr.IP(x) for x in ipaddr.IP("1.1.1.0/24")])')
10 loops, best of 3: 126 msec per loop

I could just be miss remembering something 

Original comment by pmo...@google.com on 19 Aug 2009 at 6:42

GoogleCodeExporter commented 9 years ago
Hi Peter,

The optimization is for a list of IP addresses not a list of /32's which are 
IPv4Network's. Try:

(cd ipaddr-py2; python -m timeit 'import ipaddr; 
ipaddr.collapse_address_list([x for 
x in ipaddr.IP("1.1.1.0/24")])')

I'll look at what I can do to make it work for IPv*Networks that are /32's or 
/128's 
as well.

Original comment by mattimus...@gmail.com on 19 Aug 2009 at 10:43

GoogleCodeExporter commented 9 years ago
Patch Set 6 is uploaded to Reitveld with optimization for a list of networks.

mflanaga:~/src/ipaddr-py$ python -m timeit 'import ipaddr; 
ipaddr.collapse_address_list([ipaddr.IP(x) for x in ipaddr.IP("1.1.1.0/24")])'
10 loops, best of 3: 70 msec per loop
mflanaga:~/src/ipaddr-py-2.0.x$ python -m timeit 'import ipaddr; 
ipaddr.collapse_address_list([ipaddr.IP(x) for x in ipaddr.IP("1.1.1.0/24")])'
100 loops, best of 3: 13 msec per loop
mflanaga:~/src/ipaddr-py-2.0.x$ python -m timeit 'import ipaddr; 
ipaddr.collapse_address_list([x for x in ipaddr.IP("1.1.1.0/24")])'
100 loops, best of 3: 3.02 msec per loop

With attached benchmark scripts:

mflanaga:~/src/ipaddr-py$ python collapse_test.py
CollapseAddrList: 62.501075983 s
mflanaga:~/src/ipaddr-py-2.0.x$ python collapse_test2.py
CollapseAddrList IPs: 0.83060002327 s
CollapseAddrList nets: 0.989233970642 s

Original comment by mattimus...@gmail.com on 20 Aug 2009 at 12:47

Attachments:

GoogleCodeExporter commented 9 years ago
fixed in r95. thanks, Matt!

new:                                                                            

(cd ipaddr-issue17 ; python -m timeit 'import ipaddr 
ipaddr.collapse_address_list([ipaddr.IP(x) for x in ipaddr.IP("1.1.1.0/24")])')
10 loops, best of 3: 11.7 msec per loop

vs. old:                                                                        

(cd ipaddr-py ; python -m timeit 'import ipaddr 
ipaddr.collapse_address_list([ipaddr.IP(x) for x in ipaddr.IP("1.1.1.0/24")])') 

10 loops, best of 3: 125 msec per loop

Original comment by pmo...@google.com on 21 Aug 2009 at 5:12