roc230 / spymemcached

Automatically exported from code.google.com/p/spymemcached
0 stars 0 forks source link

slow but steady stream of read timeouts #124

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What version of the product are you using? On what operating system?

memcached 1.2.5
spymemcached 2.4.2

Running under Jetty Servlets.

Tell me more...

See
http://groups.google.com/group/spymemcached/browse_thread/thread/dd16041e601288c
7

Original issue reported on code.google.com by ingen...@gmail.com on 9 Mar 2010 at 8:15

GoogleCodeExporter commented 9 years ago
Nothing conclusive, but I used some specific flags, increased the number of
iterations and changed the retrieve line to print a single "." (so I can find 
things
among the verbose output).

Findings:
- A number of the early timeouts correlate closely to classloader activity
- Later timeouts tend to come in groups leading me to believe it's some 
external factor
- As reported, the timeouts don't correlate well to GC activity
- As the run continues, the timeouts tend to reach a steady state, but are 
lower than
in the ramp up of the run.

It's looking a lot like this is more about external factors that keep the 
application
from being scheduled.

I may be able to get some more data with some lower level tools.

Original comment by ingen...@gmail.com on 9 Mar 2010 at 9:23

GoogleCodeExporter commented 9 years ago
Well, I should be sleeping, but I think I've solved this one.

A quick-ish dtrace script to count the number of times a spymemcached thread is
kicked off the CPU for more than 4.5ms shows a really strong correlation with 
the
count from within your program.

Across 1,000,000 iterations
From your program:
  Timeout count is 218
From DTrace:
  Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11211}  197

The latter one there is the thread name and the count.

Interestingly, the threads either do their thing right away, or they get 
starved as
you see in a histogram of the time off CPU.  By the way, I counted anything over
4.5ms in here, which is why you see so many just under 5.

  Memcached IO over {MemcachedConnection to localhost/127.0.0.1:11211}
           value  ------------- Distribution ------------- count    
             < 5 |@@@@@@@@@@@@@@@@@                        85       
               5 |                                         0        
               6 |                                         0        
               7 |                                         0        
               8 |                                         0        
               9 |                                         0        
              10 |                                         0        
              11 |                                         0        
              12 |                                         0        
              13 |                                         0        
              14 |                                         0        
              15 |                                         0        
              16 |                                         0        
              17 |                                         0        
              18 |                                         0        
              19 |                                         0        
              20 |                                         0        
              21 |                                         0        
              22 |                                         0        
              23 |                                         0        
              24 |                                         0        
              25 |                                         0        
              26 |                                         0        
              27 |                                         0        
              28 |                                         0        
              29 |                                         0        
              30 |                                         0        
              31 |                                         0        
              32 |                                         0        
              33 |                                         0        
              34 |                                         0        
              35 |                                         0        
              36 |                                         0        
              37 |                                         0        
              38 |                                         0        
              39 |                                         0        
              40 |                                         0        
              41 |                                         0        
              42 |                                         0        
              43 |                                         0        
              44 |                                         0        
              45 |                                         0        
              46 |                                         0        
              47 |                                         0        
              48 |                                         0        
              49 |                                         0        
              50 |                                         0        
              51 |                                         0        
              52 |                                         0        
              53 |                                         0        
              54 |                                         0        
              55 |                                         0        
              56 |                                         0        
              57 |                                         0        
              58 |                                         0        
              59 |                                         0        
              60 |                                         0        
              61 |                                         0        
              62 |                                         0        
              63 |                                         0        
              64 |                                         0        
              65 |                                         0        
              66 |                                         0        
              67 |                                         0        
              68 |                                         0        
              69 |                                         0        
              70 |                                         0        
              71 |                                         0        
              72 |                                         0        
              73 |                                         0        
              74 |                                         0        
              75 |                                         0        
              76 |                                         0        
              77 |                                         0        
              78 |                                         0        
              79 |                                         0        
              80 |                                         0        
              81 |                                         0        
              82 |                                         0        
              83 |                                         0        
              84 |                                         0        
              85 |                                         0        
              86 |                                         0        
              87 |                                         0        
              88 |                                         0        
              89 |                                         0        
              90 |                                         0        
              91 |                                         0        
              92 |                                         0        
              93 |                                         0        
              94 |                                         0        
              95 |                                         0        
              96 |                                         0        
              97 |                                         0        
              98 |                                         0        
              99 |                                         0        
          >= 100 |@@@@@@@@@@@@@@@@@@@@@@@                  112      

The solutions to this are:
- Rewrite spymemcached to use Java real time; make everyone buy it or run 
Solaris to
get it for free
- Tune the scheduler time quantums for your OS to ensure some of those threads 
are
less starved; I know a bit about how to do it on Solaris/OpenSolaris but many 
other
platforms may require lots of surgery
- Buy more CPUs
- Turn up your timeout value

I recommend going for one of the last two options.  :)

Original comment by ingen...@gmail.com on 9 Mar 2010 at 11:03

Attachments:

GoogleCodeExporter commented 9 years ago
Well, after sleeping on this, one thing which could be done is limiting the 
number of
outstanding things to be done.  I would have to look into the details here to 
see
what the best way to do this is.  Still if threads are being taken off CPU for 
the
amount of time in the timeout, it will timeout.

Original comment by ingen...@gmail.com on 9 Mar 2010 at 9:31

GoogleCodeExporter commented 9 years ago

Original comment by ingen...@gmail.com on 14 Mar 2010 at 11:45

GoogleCodeExporter commented 9 years ago
Is there a way to set the priority of the SpyMemcache java thread  to be 
higher? so we less like to get starvation? 

Original comment by S.Ille...@gmail.com on 15 Sep 2010 at 8:01

GoogleCodeExporter commented 9 years ago
Probably, though that's relatively JVM/OS dependent.  On Solaris, for instance, 
there are com.sun.* APIs to change the priorities of threads in that JVM.  
Other JVM builds may have similar capabilities, and Sun's JVM for Linux may 
support this.

The other thing that may be possible is setting up affinity for these threads 
to CPUs which are not processing other tasks on the system.  Specifically, move 
interrupts to a set of cores, and bind the JVM to dedicated CPU cores for this 
processing.  These are all things that are "normally" done when trying to 
ensure low latency for scheduling of tasks.  

Note that none of it is a guarantee though.  You may be able to limit the rate 
of your requests to guarantee the rest are more likely to get through.  This is 
pretty similar to getting more CPUs or dedicating resources.

Which JVM/OS are you seeing issues on?

Original comment by ingen...@gmail.com on 15 Sep 2010 at 8:51

GoogleCodeExporter commented 9 years ago
Closing this one as there are a number of things to help from issue 136, and I 
think the rest of the possible issues can be addressed with documentation about 
what a timeout is and is not.

Basic summary: set a timeout if you would reasonably do something different 
after the timeout interval.  If what you're trying to do is ensure things 
happen with as little latency as possible, setting a lower timeout won't help.

Original comment by ingen...@gmail.com on 20 Apr 2011 at 6:27