Closed GoogleCodeExporter closed 9 years ago
Original comment by sjsrey
on 20 Mar 2010 at 6:02
Thu Apr 15 01:37:13 2010 Profile.prof
139 function calls in 0.002 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.001 0.000 0.001 0.001 _pyShpIO.py:49(_unpackDict)
1 0.000 0.000 0.000 0.000 :0(setprofile)
27 0.000 0.000 0.000 0.000 :0(read)
27 0.000 0.000 0.000 0.000 :0(unpack)
27 0.000 0.000 0.000 0.000 :0(calcsize)
27 0.000 0.000 0.000 0.000 :0(len)
1 0.000 0.000 0.001 0.001 _pyShpIO.py:327(_open_shx_file)
1 0.000 0.000 0.002 0.002 profile:0(shptest.test())
2 0.000 0.000 0.000 0.000 :0(open)
1 0.000 0.000 0.001 0.001 _pyShpIO.py:317(__init__)
1 0.000 0.000 0.001 0.001 _pyShpIO.py:135(_open_shp_file)
1 0.000 0.000 0.002 0.002 pyShpIO.py:54(__init__)
1 0.000 0.000 0.000 0.000 FileIO.py:81(getType)
1 0.000 0.000 0.002 0.002 shptest.py:11(test)
1 0.000 0.000 0.000 0.000 genericpath.py:85(_splitext)
1 0.000 0.000 0.000 0.000 FileIO.py:68(__new__)
4 0.000 0.000 0.000 0.000 :0(endswith)
1 0.000 0.000 0.002 0.002 pyShpIO.py:64(__open)
1 0.000 0.000 0.002 0.002 _pyShpIO.py:121(__init__)
1 0.000 0.000 0.000 0.000 FileIO.py:146(__init__)
2 0.000 0.000 0.000 0.000 :0(rfind)
1 0.000 0.000 0.000 0.000 posixpath.py:94(splitext)
1 0.000 0.000 0.002 0.002 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 :0(replace)
1 0.000 0.000 0.000 0.000 _pyShpIO.py:218(type)
1 0.000 0.000 0.000 0.000 _pyShpIO.py:115(__isreadable)
1 0.000 0.000 0.000 0.000 :0(__new__)
1 0.000 0.000 0.000 0.000 _pyShpIO.py:311(__isreadable)
1 0.000 0.000 0.000 0.000 FileIO.py:124(__init__)
0 0.000 0.000 profile:0(profiler)
Original comment by phil.stp...@gmail.com
on 15 Apr 2010 at 8:41
Attachments:
#This test run on a large shapefile, peru roads.shp, 34.2 Mb. Results largely
the same.
Thu Apr 15 01:55:40 2010 Profile.prof
139 function calls in 0.034 CPU seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.026 0.026 0.032 0.032 _pyShpIO.py:327(_open_shx_file)
27 0.004 0.000 0.004 0.000 :0(unpack)
27 0.001 0.000 0.001 0.000 :0(calcsize)
27 0.001 0.000 0.001 0.000 :0(read)
2 0.001 0.000 0.002 0.001 _pyShpIO.py:49(_unpackDict)
1 0.001 0.001 0.033 0.033 _pyShpIO.py:317(__init__)
27 0.000 0.000 0.000 0.000 :0(len)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.000 0.000 0.034 0.034 profile:0(shptest.test())
2 0.000 0.000 0.000 0.000 :0(open)
1 0.000 0.000 0.034 0.034 _pyShpIO.py:135(_open_shp_file)
1 0.000 0.000 0.034 0.034 pyShpIO.py:54(__init__)
4 0.000 0.000 0.000 0.000 :0(endswith)
1 0.000 0.000 0.034 0.034 pyShpIO.py:64(__open)
1 0.000 0.000 0.000 0.000 genericpath.py:85(_splitext)
1 0.000 0.000 0.000 0.000 FileIO.py:68(__new__)
1 0.000 0.000 0.000 0.000 FileIO.py:81(getType)
1 0.000 0.000 0.034 0.034 shptest.py:11(test)
1 0.000 0.000 0.034 0.034 _pyShpIO.py:121(__init__)
1 0.000 0.000 0.000 0.000 FileIO.py:146(__init__)
1 0.000 0.000 0.000 0.000 posixpath.py:94(splitext)
2 0.000 0.000 0.000 0.000 :0(rfind)
1 0.000 0.000 0.034 0.034 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 _pyShpIO.py:218(type)
1 0.000 0.000 0.000 0.000 :0(replace)
1 0.000 0.000 0.000 0.000 :0(__new__)
1 0.000 0.000 0.000 0.000 _pyShpIO.py:311(__isreadable)
1 0.000 0.000 0.000 0.000 _pyShpIO.py:115(__isreadable)
1 0.000 0.000 0.000 0.000 FileIO.py:124(__init__)
0 0.000 0.000 profile:0(profiler)
Original comment by phil.stp...@gmail.com
on 15 Apr 2010 at 8:59
#This test was run on a really large shapefile, tl_2008_us_zcta500.shp, 1 GB
3243586 function calls (3237237 primitive calls) in 143.115 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 143.118 143.118 {execfile}
1 3.756 3.756 143.117 143.117 profile_shp.py:1(<module>)
32038 0.097 0.000 138.396 0.004 FileIO.py:242(get)
32038 0.642 0.000 138.168 0.004 FileIO.py:286(__read)
32038 0.846 0.000 132.020 0.004 pyShpIO.py:120(_read)
155654 93.912 0.001 93.964 0.001 standalone.py:497(is_clockwise)
32038 0.218 0.000 47.049 0.001 shapes.py:818(__init__)
77827 0.671 0.000 46.738 0.001 shapes.py:836(clockwise)
32038 0.979 0.000 36.070 0.001 _pyShpIO.py:243(get_shape)
32038 15.498 0.000 28.975 0.001 _pyShpIO.py:474(unpack)
10368 0.077 0.000 26.025 0.003 {map}
96116 1.203 0.000 13.790 0.000 _pyShpIO.py:49(_unpackDict)
352445 7.974 0.000 7.974 0.000 {_struct.unpack}
96142 5.752 0.000 5.752 0.000 {method 'read' of 'file' objects}
32038 0.036 0.000 5.487 0.000 shapes.py:861(__len__)
32038 0.286 0.000 5.451 0.000 shapes.py:864(len)
32038 0.845 0.000 5.156 0.000 shapes.py:886(vertices)
64076 4.311 0.000 4.311 0.000 {sum}
352463 2.881 0.000 2.881 0.000 {_struct.calcsize}
288342 1.584 0.000 1.584 0.000 {method 'read' of 'cStringIO.StringI' objects}
1 0.005 0.005 0.794 0.794 __init__.py:32(<module>)
10 0.011 0.001 0.467 0.047 __init__.py:5(<module>)
2 0.006 0.003 0.424 0.212 common.py:4(<module>)
Original comment by schmi...@gmail.com
on 4 May 2010 at 10:58
The profiler script attached above doesn't actually read the file, it just
opens it.
Attached is a modified script the reads the entire file.
Original comment by schmi...@gmail.com
on 4 May 2010 at 11:02
Attachments:
I optimized the clockwise test to be more pythonic and got a significant speed
up,
3087964 function calls (3081615 primitive calls) in 93.330 CPU seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 93.332 93.332 {execfile}
1 3.888 3.888 93.332 93.332 profile_shp.py:1(<module>)
32038 0.089 0.000 88.589 0.003 FileIO.py:242(get)
32038 0.562 0.000 88.379 0.003 FileIO.py:286(__read)
32038 0.715 0.000 82.963 0.003 pyShpIO.py:120(_read)
155654 50.349 0.000 50.380 0.000 standalone.py:560(is_clockwise)
32038 0.857 0.000 31.045 0.001 _pyShpIO.py:243(get_shape)
32038 14.489 0.000 26.768 0.001 _pyShpIO.py:474(unpack)
32038 0.172 0.000 25.569 0.001 shapes.py:818(__init__)
77827 0.436 0.000 25.320 0.000 shapes.py:836(clockwise)
10368 0.055 0.000 13.326 0.001 {map}
96116 1.087 0.000 12.520 0.000 _pyShpIO.py:49(_unpackDict)
352445 7.212 0.000 7.212 0.000 {_struct.unpack}
32038 0.033 0.000 4.838 0.000 shapes.py:861(__len__)
32038 0.337 0.000 4.805 0.000 shapes.py:864(len)
32038 0.746 0.000 4.461 0.000 shapes.py:886(vertices)
64076 3.716 0.000 3.716 0.000 {sum}
96142 3.065 0.000 3.065 0.000 {method 'read' of 'file' objects}
352463 2.655 0.000 2.655 0.000 {_struct.calcsize}
288342 1.465 0.000 1.465 0.000 {method 'read' of 'cStringIO.StringI' objects}
1 0.005 0.005 0.757 0.757 __init__.py:32(<module>)
2 0.006 0.003 0.429 0.215 common.py:4(<module>)
Original comment by schmi...@gmail.com
on 5 May 2010 at 12:11
The current pure python shape reader is pretty inefficient, just reading the 1
gig file and parsing the binary
takes ~38 CPU seconds and ~2.3 million function calls. This isn't doing any
calculations, just unpacking
binary data. Part of the reason for this is that the core shapefile reader was
designed to be understandable,
not efficient (it dates back to the Stars refactoring).
Adding in the calculations to determine if each ring is clockwise or
counter-clockwise adds another ~27 CPU
seconds and ~240K function calls. There seems to be a lot of room to make
optimizations here.
Turning the rings into a pysal.Polygon adds yet another bottleneck. Polygon
repeats the clockwise test on
each ring (it converts all rings to clockwise). This adds the remaining ~29
cpu seconds and ~551K calls.
My recommendations are the we 1.) optimize clockwise testing. 2.) overhaul
polygon class. 3.) overhaul
shapereader.
1.) the clockwise test is a significant bottleneck.
2.) A more passive polygon class could avoid clockwise testing is many cases.
3.) Significant speed ups could be achieved by reducing function calls and
reading directly into arrays.
Original comment by schmi...@gmail.com
on 5 May 2010 at 6:45
Original issue reported on code.google.com by
schmi...@gmail.com
on 25 Feb 2010 at 11:57