Open JeremyBowen opened 13 years ago
hi, i assume you're related to @miksovic?
anyway, thank you very much for your effort, no further tests are necessary. it's clear that the c++ implementation is considerably slower... and you say it doesn't spend much time in CPU, so it seems to be waiting on locks or IO. my first thought was maybe my c++ skills were rusty and I used lots of copy-constructing or something in that spirit, but if it's not CPU bound, it's less probable to be the cause. it's more likely that i'm either doing the IO wrong, or that my Mutex
class is very slow
some pointers:
a good test would be running using strace
, to see where it's waiting. i'd appreciate if you could do that for me.
as for attaching files -- it would be easiest to just create a repo on github and upload your files there (that way it's revision-controlled and easy to share), or you could use http://gist.github.com (or pastebin, etc.)
and last but not least -- i'm busy over my head at the moment, it might take me a several days to get to it. any progress you guys make would be greatly appreciated (you don't have to solve the problem -- just locate it). the best way is to fork the project, commit your changes/tests, and then send me a pull request.
thanks, -tomer
Hi yes, @miksovic is primarily working on Windows and I'm working on Linux.
Thanks for your response. I'm not sure this is entirely C++ related. Maybe it's more a Linux thing. I've run strace and ltrace over both my client and server programs and don't have a huge amount of new information.
I'll post the results of my tests as separate comments:
First up is testing my Linux clients (Python and C++) against a Linux C++ server:
$ ltrace -c -S python ./client.py % time seconds usecs/call calls function
73.34 62.395172 87388 714 SYS_select 26.60 22.630919 22630919 1 Py_Main 0.02 0.014398 15 916 SYS_open 0.01 0.007676 18 408 SYS_sendto ... real 0m8.250s user 0m0.172s sys 0m0.072s
Again with strace $ time strace -c python ./client.py % time seconds usecs/call calls errors syscall
78.64 0.000081 0 408 sendto 21.36 0.000022 0 295 248 stat 0.00 0.000000 0 221 read 0.00 0.000000 0 50 write ... real 0m8.253s user 0m0.144s sys 0m0.084s
$ ltrace -c -S ./client % time seconds usecs/call calls function
27.29 12.051400 107 112473 mcount 21.32 9.411614 15253 617 _ZNSi4readEPcl 10.50 4.634701 7511 617 _ZNSt15basic_streambufIcSt11char_traitsIcEE6xsgetnEPcl 10.42 4.598853 15695 293 _ZNSt15basic_streambufIcSt11char_traitsIcEE5uflowEv 7.53 3.323090 8144 408 _ZNSo5writeEPKcl 5.43 2.397605 23505 102 _ZNSo5flushEv 4.70 2.075017 919 2257 _ZNKSt15basic_streambufIcSt11char_traitsIcEE5pbaseEv 4.67 2.061238 1260 1635 _ZNKSt9basic_iosIcSt11char_traitsIcEE3eofEv 3.29 1.453525 4960 293 recvmsg 3.22 1.421815 4803 296 SYS_recvmsg ... real 0m11.073s user 0m1.756s sys 0m5.576s Now strace on C++ client:
$ time strace -c ./client % time seconds usecs/call calls errors syscall
100.00 0.004000 19 209 recvmsg 0.00 0.000000 0 11 read 0.00 0.000000 0 51 write 0.00 0.000000 0 12 open 0.00 0.000000 0 16 close ... real 0m8.101s user 0m0.024s sys 0m0.024s
Now Linux Clients to Windows C# Server process on network machine.
$ time ltrace -c -S python ./client.py % time seconds usecs/call calls function
72.59 60.406528 84602 714 SYS_select 27.34 22.755401 22755401 1 Py_Main 0.02 0.013669 14 916 SYS_open 0.01 0.009206 22 408 SYS_sendto ... real 0m20.849s user 0m0.164s sys 0m0.068s
$ time strace -c python ./client.py % time seconds usecs/call calls errors syscall
59.46 0.000022 0 916 755 open 40.54 0.000015 0 61 munmap 0.00 0.000000 0 221 read 0.00 0.000000 0 50 write ... real 0m20.630s user 0m0.128s sys 0m0.092s
$ time ltrace -c -S ./client % time seconds usecs/call calls function
19.65 45.960395 74490 617 _ZNSi4readEPcl 18.48 43.230169 70065 617 _ZNSt15basic_streambufIcSt11char_traitsIcEE6xsgetnEPcl 18.47 43.195086 423481 102 _ZNSt15basic_streambufIcSt11char_traitsIcEE5uflowEv 18.32 42.847775 420076 102 recvmsg 18.31 42.833417 407937 105 SYS_recvmsg 4.06 9.488322 90 104441 mcount 2.24 5.228938 12816 408 _ZNSo5writeEPKcl 0.16 0.377865 3704 102 _ZNSo5flushEv ... real 0m21.221s user 0m1.652s sys 0m4.892s
$ time strace -c ./client % time seconds usecs/call calls errors syscall
-nan 0.000000 0 11 read -nan 0.000000 0 51 write -nan 0.000000 0 12 open -nan 0.000000 0 16 close -nan 0.000000 0 13 fstat -nan 0.000000 0 35 mmap -nan 0.000000 0 24 mprotect ... real 0m20.486s user 0m0.016s sys 0m0.028s
Running C# -> C# on the Windows box is extremely fast by comparison although I don't have timings (too fast). C# client on Windows to C++ Server on Linux is again slow (~16 seconds for the same test)
I'm beginning to wonder if my Linux box has some sort of problem. It's obviously not restricted to C++ as both C++, C# and Python clients behave the same way, as far as I can tell, regardless of what language the server is implemented in.
I appreciate that you're busy and I don't expect you to solve this for me. I'll keep trying to identify the cause of the poor performance and I'll do anything I can to try and locate the problem.
Cheers
thanks for the effort!
-tomer
C++ to C++, linux box, modified the C++ test to include a small performance test:
// test performance
boost::posix_time::ptime t0(boost::posix_time::microsec_clock::local_time());
int tmp = 0;
const int cycles = 100;
for (int i = 0; i < cycles; i++) {
double res = adam->think(188, i+1);
if (res > 2) {
tmp = 1; // to make sure this loop is not optimized out
}
}
boost::posix_time::ptime t1(boost::posix_time::microsec_clock::local_time());
double interval = (t1 - t0).total_microseconds() / 1000000.0;
cout << "performing " << cycles << " cycles took " << interval << " seconds" << endl;
results in performing 100 cycles took 8.00376 seconds
... bummer
okay, strace seems to reveal some interesting facts: first of all, boost uses sendmsg
and recvmsg
instead of send
and recv
, but it don't suppose there's performance difference between them. on the other hand, the last two recvmsg
-es are extremely slow (time consumed in system call printed in brackets at the end of the line)
sendmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\7", 4}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 4 <0.000023>
sendmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\35", 4}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 4 <0.000010>
sendmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\0", 4}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 4 <0.000008>
sendmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\1\0\r\2744\0\0\0\0\10\211.0@g\200\0\0\0\0\0@\0\0\0\0\0\0\0", 29}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 29 <0.000007>
recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\7\0\0\0\0\0@g\200\0\0\0\0\0\0\0\0\tIDL_MAGIC\0\0"..., 504}], msg_controllen=0, msg_flags=0}, 0) = 4 <0.039792>
recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\t\0\0\0\0\0@W\200\0\0\0\0\0\0\0\0\tIDL_MAGIC\0\0"..., 504}], msg_controllen=0, msg_flags=0}, 0) = 17 <0.039703>
recvmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\7\0\0\0\0\1\0\r\2744\0\0\0\0\10\211.0@g\200\0\0\0\0\0?\360\0"..., 504}], msg_controllen=0, msg_flags=0}, 0) = 4 <0.039833>
recvmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\35\0\0\0\0\1\0\r\2744\0\0\0\0\10\211.0@g\200\0\0\0\0\0@\0\0"..., 504}], msg_controllen=0, msg_flags=0}, 0) = 37 <0.039705>
sendmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\7", 4}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 4 <0.000024>
sendmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\t", 4}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 4 <0.000011>
sendmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\0\0\0\0", 4}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 4 <0.000009>
sendmsg(8, {msg_name(0)=NULL, msg_iov(1)=[{"\0@W\200\0\0\0\0\0", 9}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 9 <0.000008>
both the server and the client seem to be waiting a long time (blocking on recvmsg
) for incoming data... 0.04 second. this happens twice for each invocation and thus every invocation takes roughly 0.08sec, which adds up to 8 seconds per 100 calls.
I don't know if you've been working on this lately but how difficult would it be to replace the Boost socket handling with a simple implementation of a socket library.
I've been experimenting with some simple socket handling code and the strace info reveals no problems in sending/receiving times. (~8 seconds per 40,000 calls compared with 8 seconds per 100 calls)
At first glance, linking this into the rest of the C++ framework in libagnos however would require quite a bit of work as I'm not really familiar with the boundaries of the boost code and which features you are making use of.
yes, i think the problem is boost's tcp::iostream
, so i started a branch using tcp::socket
instead (to directly do send()
and recv()
). but then i wouldn't be able to use compression (boost's zlib filter
), so i'll have to resort to writing my own version of tcp::iostream
. it's on my todo list.
Any updates on this issue?
no, i'm not actively maintaining agnos any more...
I've got a potential problem with speed of calls between a client and a server process running C++ code under linux.
The issue is that I can get a very low throughput in terms of number of transactions per second. We've tried C# on windows and can get 10's of thousands of calls per second when running the client/server both on the same PC (localhost) or across a network.
Under Linux running the C++ code, I can't get anywhere near that number of transactions.
I have a simple loop that performs two RPC calls 100 times and timing this with the Linux "time" command. This takes approx 16 seconds when running the client & server on the same PC. It's roughly the same across a network.
Running the Linux C++ client against the Windows C# Server results in approx the same slow behaviour. (~16s per 200 calls) Running the Windows C# client against the Linux C++ Server results in approx the same slow behaviour. Running a Linux Python client against a C++ Linux server results in the same slow behaviour (~16s per 200 calls)
It looks like the Linux C++ code is possibly waiting for a timeout or maybe there is something blocking, waiting to setup a connection. I'm guessing here but the CPU barely notices so it doesn't appear to be a lack of raw CPU cycles.
We've been unable to create a C++ Windows version to compare C++ vs C# and nor have I been able to figure out the relevant python syntax to create a python server under Linux.
If I can figure out how to attach some sample code to this issue I will do so, otherwise I'll post it here shortly.