This pull request solves the awful performance problem at program start-up, when generating curl requests.
After investigation and a little profiling, it seems that the problem comes from the storage in list of generated curl_items:
for header in xxx:
for internal in yyy:
…
if item not in self.curl_items:
self.curl_items.append(item)
…
To check if a CurlItem object is already in the list, the CurlItem__eq__() function is called to compare the attributes of two objects, returned in a tupple by the __attrs() method:
For each item added to the curl_items list, the inserted object is compared to all existing items in the list (if item not in self.curl_items). So, to generate all payloads, __eq()__ is called 13001247 times, and __attr()26002494 times. Damn!
time ./bypass_url_parse.py --dump-payloads -u http://127.0.0.1:8000/foo/bar -v > /dev/null
real 0m25.489s
user 0m25.318s
sys 0m0.104s
It is possible (first commit) to refactor the __eq__() function to remove the unnecessary call to __eq() using the built-in dictionary __dict__ which contains everything required to compare two objects:
With 26002494 of less tupple, the code is already faster:
time ./bypass_url_parse.py --dump-payloads -u http://127.0.0.1:8000/foo/bar -v > /dev/null
real 0m8.741s
user 0m8.662s
sys 0m0.052s
Not enough, the best solution is to abandon the list in favor of a set, much better adapted here. Sets uses the __hash()__ function to get a unique collection of values and compare two objects.
This pull request solves the awful performance problem at program start-up, when generating curl requests.
After investigation and a little profiling, it seems that the problem comes from the storage in list of generated
curl_items
:To check if a CurlItem object is already in the list, the CurlItem
__eq__()
function is called to compare the attributes of two objects, returned in a tupple by the__attrs()
method:For each item added to the curl_items list, the inserted object is compared to all existing items in the list (
if item not in self.curl_items
). So, to generate all payloads,__eq()__
is called 13001247 times, and__attr()
26002494 times. Damn!It is possible (first commit) to refactor the
__eq__()
function to remove the unnecessary call to__eq()
using the built-in dictionary__dict__
which contains everything required to compare two objects:With 26002494 of less
tupple
, the code is already faster:Not enough, the best solution is to abandon the
list
in favor of aset
, much better adapted here. Sets uses the__hash()__
function to get a unique collection of values and compare two objects.Initial CurlItem
__hash__()
function looks like:No, not again __attr() :-( So we move to:
And the result is without appeal: