Fix Python benchmarks - Githubissues

vsbogd commented 6 years ago

Python benchmarks are broken. At the moment all of benchmarks are guarded by #if HAVE_CYTHONX preprocessor condition: https://github.com/opencog/benchmark/blob/997490970adb88250f01f316ef98bf4a92cc1df5/atomspace/atomspace/AtomSpaceBenchmark.cc#L467 But code cannot be compiled when HAVE_CYTHONX is defined.

linas commented 6 years ago

Note that HAVE_CYTHON is automatically defined by CMake so CYTHONX was a hack to avoid the broken-ness.

vsbogd commented 6 years ago

I have fixed Python benchmarks: https://github.com/opencog/benchmark/compare/master...vsbogd:fix-python-benchmarks?w=1 I have used guile implementation as reference.

But they are extremely slow even comparing with guile. I am going to check what is the reason. And after that decide whether it makes sense improving this code or some another approach required for testing python bindings.

linas commented 6 years ago

Last time the python benchmarks ran, they were as fast or faster than guile.

vsbogd commented 6 years ago

I have tested it on my workstation and it looks really faster and faster than guile. Probably the reason of confusion is incorrect python setup on my laptop. Raised PR https://github.com/opencog/benchmark/pull/12

linas commented 6 years ago

?? The guile numbers I measure are 2x or 3x faster than the python numbers you just reported in the other pull request. And I'm pretty sure your CPU is faster than mine.

linas commented 6 years ago

The guile and the python numbers will differ in the following ways:

1) guile has an extremely heavy-weight enter/leave cost; for python it is very small. So if you measure single statements, you are measuring not the atomspace, etc. but the cost of entering/leaving guile or python. You can see this by measuring performance of a no-op (something like 2+2).

2) guile is much faster than python, once you are already inside of it. You can measure this by looping over things, in scheme, or in python. You can also see it by visual inspection: if you read the autogened code that cython generates, there's a lot of cruft and setup in the cython wrappers. By contrast, guile turns everything into bytecode, and the calls are almost immediate.

The upshot is that guile runs at about half the speed of the native C++ code, while python runs about eight times slower.

linas commented 6 years ago

Hmmm.. Except my first statement is not true. According to my diary notes, python is 2x slower to enter/leave than guile. In 15 March 2015 entry, I was getting an enter/leave rate of 18K/sec for guile, and 8K/sec for python.

linas commented 6 years ago

Also: the 15 march 2015 entry was reporting 48K/sec addNode for python-interpreted. (I was unable to make python-memoized work at that time. Its possible that python does not memoize, I'm not sure). The same entry was showing 120K/sec for guile. So this was showing a better than 2x advantage over python.

Flip-side: some of the numbers being reported seemed crazy; I'm not totally convinced that the benchmark is measuring things correctly.

Also: python is single-threaded (they don't want to use locks, because that hurts their performance) but guile is fully threaded, and I suspect some stuff is running in other guile threads, and that the benchmark does not wait for those threads to finish, before reporting a time. Maybe. I'm not at all sure that guile is using other threads; I just don't understand why the reported performance numbers are kind-of crazy-ish.

linas commented 6 years ago

Sorry, I guess I am giving conflicting remarks. Earlier, I posted "Last time the python benchmarks ran, they were as fast or faster than guile." Now I'm saying "python is slower".

There are multiple issues involved;

I'm not convinced that atomspace_bm isn't reporting crazy numbers.
Numbers as measured by writing home-brew mybenchmark.py and mybenchmark.scm scripts differ a lot from what atomspace_bm reports
Naive home-brew tests typically fail to distinguish between interpreting, memoizing and compiling.
Threads ...

vsbogd commented 6 years ago

When I tried to analyze the reason of python slowness on my laptop I have looked at PythonEval code and supposed that it may be slow because PythonEval calls the interpreter for each statement separately. When loop is executing in Python interpreter it should be faster. So your homebrew python benchmarks can be faster than atomspace_bm one by this reason.

Then it brings a question should PythonEval execution be included into benchmark or not?

I though that PythonEval was included into atomspace_bm by intention as PythonEval is used to execute GroundedPredicateNodes and its performance affects GPN performance. But after reading comments above I think that it may make sense to measure Python bindings performance and PythonEval (GPN) performance separately. Use pure Python benchmarks to measure former one and use C++ atomspace_bm to measure latter one.

I have played randomly with Guile and Python benchmarks and they shows different performance (relative to each other) with different parameters. I think I should spend more time on this to make some conclusions.

linas commented 6 years ago

I think I should spend more time on this to make some conclusions.

Yes.

measure Python bindings performance and PythonEval (GPN) performance separately. Use pure Python benchmarks to measure former one and use C++ atomspace_bm to measure latter one.

Yes. However, .. and this is important: essentially no one will ever run "pure python" -- that's because we do not have any "pure python" code, at all. So, although that would measure the speed of the bindings, and can be used as a tuning guide, it does not measure anything used "in real life".

There are currently just two usage scenarios:

Evaluation of GroundedPredicateNode "py:some_function"
Start cogserver; telnet localhost 17001 and enter the py shell, and run ad-hoc scripts by hand.

I doubt that the second usage is done very much, and so it's performance is not very interesting.

There's a third quasi-usage that isn't actually a usage: connecting up ROS (which is pure python) to opencog. This is done by writing a python snipped to open a socket to the cogserver, and then sending scheme strings on that socket. (One could also send python strings on that socket, but its never done).

linas commented 6 years ago

There is something that we do not do, but maybe we should: create a ROS node, pure-python, start up opencog inside of it (start the cogserver inside of it too), and then use the "pure-python" API to opencog to stuff atoms into the atomspace. This could be a lot faster, a lot more efficient, than sending ascii strings over sockets. You may want to consult with @jdddog and @AmeBel about the details of implementing this.

There are several things that would need to be fixed, along the way:

The current python bindings do not offer any way to start/stop the cogserver. We would still need to run a cogserver, in this scenario.
The python atomspace is often crazily out-of-sync with the atomspace that the cogserver is using, and with the atomspace that guile is using. Depending on the usage scenario, it either starts out of sync, (viz two different atomspaces get used) or it looses sync after a while. There's bugginess in how python deals with the atomspace.
general stability: its not clear that a pure-python opencog can stay up without crashing, for very long.
threading: python is single-threaded. We would have to run opencog in a different thread than python, because it will typically doing all sorts of cpu-intensive processing. Python must not block opencog. So, for example: A ROS node, if it is not sending or receiving any messages, will typically call the read() system call on a socket, and the operating system will put that process to sleep if there is nothing to read. (viz, its blocked on the socket read). Sleeping can be avoided by putting read in its own thread; then the OS only sleeps that thread. This means: we have to run the "pure python" part in it's own thread. I don't think we've ever done that before, and its unknown how many bugs it might expose.

linas commented 6 years ago

https://github.com/opencog/ghost_bridge/issues/5 describes the above ROS-bridge idea. However, as I wrote it I realized that it's not actually all that interesting, since python-ROS never needs to directly poke atoms into the atomspace, and so there's not much point to it.

vsbogd commented 6 years ago

Yes, I as far as I see GHOST is written using Scheme. To make as thin as possible ROS/GHOST bridge one should either write ROS Node on Scheme (but ROS doesn't have Scheme API) or write GHOST on Python or use C++ to write both.

opencog / benchmark

Fix Python benchmarks #9