rax-maas / blueflood

A distributed system designed to ingest and process time series data
http://www.blueflood.io
Apache License 2.0
596 stars 102 forks source link

cpu to 369% in ingetst server #749

Open 42701618 opened 8 years ago

42701618 commented 8 years ago

The version of My blueflood is blueflood-rax-release-v1.0.1956. My blueflood is used for ingestion.I am ingest the data to the ingestion server all the time.Suddenly,the cpu is to 369%.My computer is 2494 MHz, 4 cpu cores,mem is 16G. The log is following: java.lang.OutOfMemoryError: GC overhead limit exceeded Dumping heap to ./logs ... Heap dump file created [1258805223 bytes in 7.382 secs] Exception in thread "Shard state reader" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Shard state reader" Exception in thread "Shard state writer" Exception in thread "FileWatchdog" java.lang.OutOfMemoryError: Java heap space Exception in thread "pool-23-thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded

How should I do to investigate the problem?

usnavi commented 8 years ago

Hi! Just to clarify, what is your jvm heap size set to? For our install, we are using -Xms16G -Xmx16G.

42701618 commented 8 years ago

my jvm heap size is -Xms1G -Xmx1G that is following the example.

usnavi commented 8 years ago

Welp, that's most likely your issue. You should raise your max heap size.

When the heap gets filled, the jvm has to stop its processing to attempt to remove any unused objects from the heap--and that takes CPU. If your heap is too small for your application, your jvm will fill up and spend a lot of time clearing the heap. Most likely it will get behind and then you'll get an OutOfMemoryError.

I assume you are running on a 64-bit machine? Maybe try 8G or 12G. You could even go higher, depending on what processes are running on your box.

42701618 commented 8 years ago

Ok ,Thank you very much,I change the heap size to 8G.Then I test it again and trace the problem.

usnavi commented 8 years ago

Are you still seeing the problem? I'm not sure what you mean by "trace the problem".

42701618 commented 8 years ago

sorry,I mean that I will take a look after change the heap size to 8G and then get back to you.

42701618 commented 8 years ago

hi,usnaavi: After running the all night,the memory is from 2.9% to 54%.During the time ,I am ingesting data ,about 3920 request per second.maybe there are problems in some code ?

drenalin23 commented 8 years ago

@42701618 - In general on a Linux system your memory usage will go up over time due to Linux caching. In our production environment we have a few ingestion nodes and looking at one of them it is sitting at 99% memory used (running for quite some time), blueflood is using about 50% of the memory on a prod node we run. This is a node with 32G of memory and heap set at 16G. This server is handling roughly 25000 requests per second (1.5million per minute). Our CPU does run high with top outputting a load of 2 normally and showing the blueflood process using 200% cpu (on a 20 core node).

drenalin23 commented 8 years ago

Numbers can be sliced many ways of course - on a 20 core system each core is doing about 10-15% based on our metrics - so while top shows 200-300% cpu usage, the system isn't actually that loaded up when viewed by per core usage. With a back of the envelope calculation you are ingesting about 20% of what we are on a node and have 20% of the cores so your ingest node should probably be able to handle that load. Are you running everything on one node? or have you set things up with separate nodes for cassandra/blueflood/elasticsearch? We are on IRC at #blueflood on freenode if you want to try and catch someone there to discuss more in realtime.