Closed nitrocode closed 8 years ago
may be due to the initial processing prior to sending the first curl request
Installing libyaml may speed it up, because pyyaml can use that for faster parsing of the YAML (possible bottleneck).
Yes, there is a fixed overhead to parsing and validating the test structure (it shouldn't be this much though), but it's a one-time cost and should not be that long unless your tests are extremely large or complex.
I would need to be able to replicate this and do some experimentation to see where the overheads are. Can you provide sample YAML and which version of Python, PyRestTest (from source or stable release from PIP), and what OS you're running?
In general, once the initial parsing & configuration is done, the actual requests should be fairly fast, because PyCurl's native code is considerably faster than requests (link is to a benchmark I did). There's some overhead in configuring and analyzing each request/response pair, of course.
The benchmark methods are fairly optimized: there's a test for performance in there, which has shown that overhead is <0.3 ms per request on a fairly weak system. I'll have to see if that has regressed.
The test methods haven't been as optimized, because they do some additional processing and there hasn't been a need yet. The curl handles could be reused (big performance gain), and there's some room for caching of templates (one smaller bottleneck).
@nitrocode ^
Hi @svanoort, here is my setup
python 2.7.11
pyyaml 3.11
Windows 7 and CentOS 7
sampletest:
---
- config:
- testset: 'Testing free api'
- timeout: 60
- curl_option_SSL_CIPHER_LIST: 'rsa_rc4_128_sha'
- test:
- group: "Quickstart"
- name: "Get data"
- url: '/posts/1'
- method: 'GET'
- validators:
- compare: {jsonpath_mini: 'userId', comparator: 'gt', expected: 0}
command:
python pyresttest/pyresttest/resttest.py http://jsonplaceholder.typicode.com sampletest/jsonplaceholder.yml
This takes about 30 msec
I haven't looked too deeply into this but recently I've been experimenting with yaml files and pyyaml and noticed that parsing yaml files that use dashes (lists) take a lot longer than without. I also noticed that there seems to be an overhead with iterating over yaml files that have many dashes which is the case of pyrestest. Though, I'm just speculating.
Take for example the examples/minimap-test.yaml file and remove all of the dashes and replace the no-value keys with 'true' called minimap-test-no-lists.yaml
---
- config:
testset: "Benchmark tests using test app"
- benchmark: # create entity
name: "Basic get"
url: "/api/person/"
warmup_runs: 7
'benchmark_runs': '101'
output_file: 'miniapp-benchmark.csv'
metrics:
total_time: 'true'
total_time: mean
total_time: median
size_download: 'true'
speed_download: median
- benchmark: # create entity
name: "Get single person"
url: "/api/person/1/"
metrics: {speed_upload: median, speed_download: median, redirect_time: mean}
output_format: json
output_file: 'miniapp-single.json'
Here is the time processed and output of your yaml file
Took 30.9998989105 msec to parse minimap-test.yaml
[{'config': [{'testset': 'Benchmark tests using test app'}]},
{'benchmark': [{'name': 'Basic get'},
{'url': '/api/person/'},
{'warmup_runs': 7},
{'benchmark_runs': '101'},
{'output_file': 'miniapp-benchmark.csv'},
{'metrics': ['total_time',
{'total_time': 'mean'},
{'total_time': 'median'},
'size_download',
{'speed_download': 'median'}]}]},
{'benchmark': [{'name': 'Get single person'},
{'url': '/api/person/1/'},
{'metrics': {'redirect_time': 'mean',
'speed_download': 'median',
'speed_upload': 'median'}},
{'output_format': 'json'},
{'output_file': 'miniapp-single.json'}]}]
Here it is for the same file without the dashes (no lists)
Took 20.9999084473 msec to parse minimap-test-no-lists.yml
[{'config': {'testset': 'Benchmark tests using test app'}},
{'benchmark': {'benchmark_runs': '101',
'metrics': {'size_download': 'true',
'speed_download': 'median',
'total_time': 'median'},
'name': 'Basic get',
'output_file': 'miniapp-benchmark.csv',
'url': '/api/person/',
'warmup_runs': 7}},
{'benchmark': {'metrics': {'redirect_time': 'mean',
'speed_download': 'median',
'speed_upload': 'median'},
'name': 'Get single person',
'output_file': 'miniapp-single.json',
'output_format': 'json',
'url': '/api/person/1/'}}]
So there is roughly a 50% decrease in time (10 msec) in processing time just by converting the function and from looking at the json differences, the complexity falls a bit once the list of lists are converted into key-value pairs.
Get benchmark's metric's total_time
using minimap-test.yml
data[1]['benchmark'][5]['metrics'][2]
Using minimap-test-no-lists.yml
data[1]['benchmark']['metrics']['total_time']
@nitrocode 10-20 milliseconds on parsing isn't enough to give a big impact overall. Is performance similar on windows and linux for your call where pyresttest is very slow?
@nitrocode I've been using a test script that does profiling of PyRestTest here: https://github.com/svanoort/pyresttest/blob/profiling-performance/profile-pyresttest.sh
This should be easy to adapt to see what the cumulative and overall time is:
python -m cProfile -s cumtime pyresttest/resttest.py http://myurl mytest.yaml > mytest-cum.txt
python -m cProfile -s tottime pyresttest/resttest.py http://myurl mytest.yaml > mytest-tot.txt
This will show where the bottlenecks are.
From the json API test script, it looks like the majority of the runtime is simply the HTTP request/response ("curl.perform()"), and the other overhead is only ~100 ms:
Test Group Quickstart SUCCEEDED: : 1/1 Tests Passed!
50354 function calls (48860 primitive calls) in 0.753 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.646 0.646 0.646 0.646 {method 'perform' of 'pycurl.Curl' objects}
1 0.010 0.010 0.753 0.753 resttest.py:2(<module>)
232 0.009 0.000 0.023 0.000 sre_compile.py:207(_optimize_charset)
2 0.009 0.004 0.009 0.004 sre_compile.py:301(_optimize_unicode)
250/38 0.007 0.000 0.019 0.000 sre_parse.py:379(_parse)
10 0.005 0.000 0.006 0.001 collections.py:288(namedtuple)
477/36 0.005 0.000 0.024 0.001 sre_compile.py:32(_compile)
3034 0.004 0.000 0.005 0.000 sre_parse.py:182(__next)
1 0.003 0.003 0.005 0.005 tests.py:1(<module>)
41 0.002 0.000 0.002 0.000 sre_compile.py:258(_mk_bitmap)
107 0.002 0.000 0.002 0.000 {range}
656/217 0.002 0.000 0.002 0.000 sre_parse.py:140(getwidth)
1 0.002 0.002 0.011 0.011 utils.py:5(<module>)
2047 0.002 0.000 0.002 0.000 sre_parse.py:130(__getitem__)
1 0.002 0.002 0.002 0.002 socket.py:45(<module>)
2533 0.001 0.000 0.005 0.000 sre_parse.py:201(get)
10088/9843 0.001 0.000 0.002 0.000 {len}
When I use the content-test.yaml (much more complex use case with generators and templates against the embedded Django test app), this changes a bit and overheads go up, but it's still only a few hundred ms. To be expected since it's a more complex scenario using many of the dynamic features, which come with overheads.
So what I'm wondering is what your test is actually doing in those 4-5s -- which the profiler output above would show, if you could be kind enough to attach it?
@nitrocode Actually... here's a better way, it'll give the full dump and I can analyze that in depth to see what's slow for you (since I can't reproduce this kind of slowness on any of my calls):
python -m cProfile -o my_api_dump.out pyresttest/resttest.py http://mylogin.api.com/api my_login_test.yaml
Rough stats on overheads with content-test.yaml (skewed because there's ZERO network overhead and requests are nearly instant):
v1.6.0 code - 60% curl.perform (200 msec) 6.6% header parsing (22 msec, includes a lot of benchmark runs) 12.8% test file load/parse (43 msec, mostly YAML parsing) ~15% (most of remainder): module loading - call it ~50 msec.
The only part of this that should get longer with more network traffic/requests is the curl.perform bit (all native code, same as curl) or header parsing.
Similar breakdown on python 2.7 in Ubuntu minimal docker container, and with the latest master branch running locally.
Looking at instructions to install pyyaml with libyaml: http://pyyaml.org/wiki/PyYAMLDocumentation
Personal notes from analyzing Python3 performance:
It appears that python 3 performance is overall similar to python 2 although distributed a bit differently - module loading seems to be a larger share (up to several hundred ms). That might be due to running within a Docker container for that portion, though.
Overall, for Python 2 & 3, it looks like rough expected benchmarks (at least on this system with a fast SSD, so disk I/O is fast) are:
The YAML parsing can be speeded up quite a bit by enabling access to libyaml bindings. According to StackOverflow. They that where pure-python YAML parsing takes 405 msec, libyaml version is 59.2 msec.
It looks like that would require some trivial PyRestTest changes (to conditionally import and use the yaml CLoader rather than)... however not worth dealing with unless parsing becomes a bottleneck.
Sample benchmark + analysis command:
python -m cProfile -o all-miniapp-nobench.profile pyresttest/resttest.py http://localhost:8000 examples/miniapp-run-all.yaml
snakeviz all-miniapp-nobench.profile
Hi @svanoort. Thank you for introducing me to profiling as I wasn't aware of this feature.
I used the same rest api in my last comment but this time profiled and it took curl 40 seconds and 99% of the time it took to complete. Unbelievable. I updated pycurl to the latest version and it's still incredibly slow on Windows. I still need to test this on Linux again.
@nitrocode Are you saying the curl.perform call is 99% of the runtime?
That's definitely a Windows issue. Windows performance for python is sometimes kind of terrible, probably other issues here too with a native (curl) library not intended for Windows use originally.
@nitrocode By the way, this post on reddit offers a suggestion that in some cases you may actually see better performance running Python code in a Linux VM than directly in the host OS. Given the sharp performance disparity on windows... I'd suggest giving that a shot here.
@nitrocode I think I've got a solution for you (result of putting together a better answer to a StackOverflow question on PyCurl vs Requests performance). The root cause is an issue with DNS performance in curl interacting with the OS (or at least it is on linux).
The code fix here should greatly increase performance, because it also allows use of pycurl's built-in connection pool for tests (in addition to the DNS cache): https://github.com/svanoort/pyresttest/pull/160
Benchmarks here suggest that pyresttest may have almost an order of magnitude improvement in executing tests by pyCurl.
With the patch, the networking portions of the tests may be up to 2x to 4x faster than unittest/requests now.
@nitrocode I've gone ahead and closed this issue, since I've root-caused this and identified the culprit (a bug in how the underlying native library interacts with the OS), along with a solid workaround.
Please let me know if this does not resolve it, and I'll re-open it. Might even be able to test on Windows separately from other OSes now.
(To be clear, this is only an order-of-magnitude improvement in running long series of tests to the same server, probably the improvement will be more modest in other cases.)
Yeah, I dont know what the issue is. I'm still receiving incredibly long curl times. After this experience, I'm no longer a fan of pycurl.
Thank you for all your efforts on this. I wish I could make it work for me. Hopefully I'm just a corner case.
@nitrocode sigh. I've got nothing then; if it misbehaves on linux, there are a couple new avenues to pursue that I discovered (some additional curl settings).
I continue to be fond of pyCurl's options and performance, but at some point in the future am going to have to offer an alternative to fall back to requests for simplicity & reliability (at the cost of features which depend on pycurl).
We're evaluating using requests instead of pyresttest because we've had issues with speed.
My login.yaml test takes about 24 seconds to run using pyresttest and a more extended test using the requests module, no yaml, and unittest2 takes about 6 seconds. Has another else noticed this too?
I'm speculating but the slow down in pyresttest may be due to the initial processing prior to sending the first curl request. After stepping through the program, with a break point on
curl.perform()
on line 350, it took about 4 to 5 seconds to before calling the curl statement. Is there a way to speed this up?