orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.73k stars 871 forks source link

"java.lang.OutOfMemoryError: GC overhead limit exceeded" on massive Insert #3517

Closed AlexanderRay closed 9 years ago

AlexanderRay commented 9 years ago

Hi,

I got an OutOfMemoryError while inserting about 10^6 Nodes in a OrientGraph.

orientdb.err

2015-02-03 11:18:03:156 INFO  OrientDB auto-config DISKCACHE=13,880MB (heap=455MB os=16,384MB disk=105,812MB) [orientechnologies]
2015-02-03 11:18:03:160 INFO  Loading configuration from: /usr/local/Cellar/orientdb/2.0.1/libexec/config/orientdb-server-config.xml... [OServerConfigurationLoaderXml]
2015-02-03 11:18:03:398 INFO  OrientDB Server v2.0.1 (build UNKNOWN@r; 2015-01-28 18:33:00+0000) is starting up... [OServer]
2015-02-03 11:18:03:432 INFO  Databases directory: /usr/local/Cellar/orientdb/2.0.1/libexec/databases [OServer]
2015-02-03 11:18:03:487 INFO  Listening binary connections on 0.0.0.0:2424 (protocol v.28, socket=default) [OServerNetworkListener]
2015-02-03 11:18:03:489 INFO  Listening http connections on 0.0.0.0:2480 (protocol v.10, socket=default) [OServerNetworkListener]
2015-02-03 11:18:03:509 INFO  Installing dynamic plugin 'orientdb-lucene-2.0-dist.jar'... [OServerPluginManager]
2015-02-03 11:18:03:661 INFO  Lucene index plugin installed and active. Lucene version: LUCENE_47 [OLuceneIndexPlugin]
2015-02-03 11:18:03:663 INFO  Installing dynamic plugin 'studio-2.0.zip'... [OServerPluginManager]
2015-02-03 11:18:03:692 INFO  Installing GREMLIN language v.2.6.0 - graph.pool.max=50 [OGraphServerHandler]
2015-02-03 11:18:03:693 INFO  [OVariableParser.resolveVariables] Error on resolving property: distributed [orientechnologies]
2015-02-03 11:18:03:694 INFO  Installing Script interpreter. WARN: authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2015-02-03 11:18:03:695 INFO  OrientDB Server v2.0.1 is active. [OServer]%                                                               14:03 log master ❯ cat orientdb.err

2015-02-03 11:18:03:155 INFO  OrientDB auto-config DISKCACHE=13,880MB (heap=455MB os=16,384MB disk=105,812MB) [orientechnologies]
2015-02-03 11:18:03:160 INFO  Loading configuration from: /usr/local/Cellar/orientdb/2.0.1/libexec/config/orientdb-server-config.xml... [OServerConfigurationLoaderXml]
2015-02-03 11:18:03:397 INFO  OrientDB Server v2.0.1 (build UNKNOWN@r; 2015-01-28 18:33:00+0000) is starting up... [OServer]
2015-02-03 11:18:03:432 INFO  Databases directory: /usr/local/Cellar/orientdb/2.0.1/libexec/databases [OServer]
2015-02-03 11:18:03:487 INFO  Listening binary connections on 0.0.0.0:2424 (protocol v.28, socket=default) [OServerNetworkListener]
2015-02-03 11:18:03:488 INFO  Listening http connections on 0.0.0.0:2480 (protocol v.10, socket=default) [OServerNetworkListener]
2015-02-03 11:18:03:509 INFO  Installing dynamic plugin 'orientdb-lucene-2.0-dist.jar'... [OServerPluginManager]
2015-02-03 11:18:03:660 INFO  Lucene index plugin installed and active. Lucene version: LUCENE_47 [OLuceneIndexPlugin]
2015-02-03 11:18:03:663 INFO  Installing dynamic plugin 'studio-2.0.zip'... [OServerPluginManager]
2015-02-03 11:18:03:692 INFO  Installing GREMLIN language v.2.6.0 - graph.pool.max=50 [OGraphServerHandler]
2015-02-03 11:18:03:692 INFO  [OVariableParser.resolveVariables] Error on resolving property: distributed [orientechnologies]
2015-02-03 11:18:03:694 INFO  Installing Script interpreter. WARN: authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2015-02-03 11:18:03:694 INFO  OrientDB Server v2.0.1 is active. [OServer]Exception in thread "Timer-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.ConcurrentHashMap$EntrySetView.iterator(ConcurrentHashMap.java:4746)
    at com.orientechnologies.orient.server.plugin.OServerPluginManager.updatePlugins(OServerPluginManager.java:277)

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Timer-0"
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:201)
    at java.lang.StringBuilder.toString(StringBuilder.java:407)
    at java.io.ObjectStreamField.getClassSignature(ObjectStreamField.java:322)
    at java.io.ObjectStreamField.<init>(ObjectStreamField.java:140)
    at java.io.ObjectStreamClass.matchFields(ObjectStreamClass.java:2259)
    at java.io.ObjectStreamClass.getReflector(ObjectStreamClass.java:2149)
    at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:517)
    at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
    at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:464)
    at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
    at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:464)Exception in thread "OrientDB <- BinaryClient (/127.0.0.1:54626)" java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded

a test class in scala:

package model.orientdb

import com.orientechnologies.orient.core.intent.OIntentMassiveInsert
import com.orientechnologies.orient.core.sql.OCommandSQL
import com.tinkerpop.blueprints.impls.orient.{OrientVertex, OrientGraphFactory, OrientGraph}
import org.scalatest._
import de.softamrhein.Logger

class OrientDBGraphAPISpec extends FlatSpecLike
  with BeforeAndAfter
  with BeforeAndAfterAll
  with Matchers
  with Logger
{

  var factory: OrientGraphFactory = null
  var graph: OrientGraph = null

  override def beforeAll() = {
    factory = new OrientGraphFactory("remote:localhost/test")
  }

  override def afterAll() = {
    factory.close()
  }

  before {
    graph = factory.getTx
    graph.getRawGraph.declareIntent( new OIntentMassiveInsert() );
  }

  after {
    graph.commit()
    graph.shutdown()
  }

  "Graph" can "remove everything" in {

    graph.command(new OCommandSQL("delete Vertex Structure")).execute()
    graph.command(new OCommandSQL("delete Vertex StructureNode")).execute()

    graph.command(new OCommandSQL("delete Edge childNode")).execute()
    graph.command(new OCommandSQL("delete Edge rootNode")).execute()

    assert(true)
  }

  def createStructure(ident: String, nodePrefix: String, levelCount: Int, nodesPerLevel: Int) = {

    val structureVertex = graph.addVertex("class:Structure", "ident", ident)

    def loop (currentLevel: Int, parentVertex: OrientVertex, edgeClass: String): Unit = {

      if (currentLevel < levelCount) {

        (1 to nodesPerLevel).foreach { idx =>

          val nodeVertex = graph.addVertex("class:StructureNode", "ident", nodePrefix + '-' + currentLevel + '-' + idx)
          parentVertex.addEdge(edgeClass, nodeVertex, edgeClass)

          loop(currentLevel + 1, nodeVertex, "childNode")
        }

      }

    }

    loop(0, structureVertex, "rootNode")
  }

  it can "create small 1-level structure" in {
    createStructure("structure-1-level", "Node1l", 1, 10)
  }

  it can "create 2-level structure" in {
    createStructure("structure-2-level", "Node2l", 2, 10)
  }

  it can "create 3-level structure" in {
    createStructure("structure-3-level", "Node3l", 3, 10)
  }

  it can "create 4-level structure" in {
    createStructure("structure-4-level", "Node4l", 4, 10)
  }

  it can "create 5-level structure" in {
    createStructure("structure-5-level", "Node5l", 5, 10)
  }

  it can "create 6-level structure" in {
    createStructure("structure-6-level", "Node6l", 6, 10)
  }
}
andrii0lomakin commented 9 years ago

And you missed linkedhashmap dependency, you can find it in distribution.

andrii0lomakin commented 9 years ago

concurrentlinkedhashmap-lru-1.4.1.jar

AlexanderRay commented 9 years ago
  1. the build from the sources fails. but I don't need it), also its ok
  2. the database is always created on the first run
andrii0lomakin commented 9 years ago

yes, you see it is running and truncated automatically http://screencloud.net/v/22t1

andrii0lomakin commented 9 years ago

wal files from 18 to 22

andrii0lomakin commented 9 years ago

hm now it is running huge, but I suppose I will find issue in test ))

AlexanderRay commented 9 years ago

ok, you can also reproduce it?

andrii0lomakin commented 9 years ago

yes

AlexanderRay commented 9 years ago

super-)

andrii0lomakin commented 9 years ago

and seems found a bug. )) Lets wait 30 min, I will run 2 times.

andrii0lomakin commented 9 years ago

Just for next time. If you need to remove all data, do not do "delete all" action. Do truncate otherwise insertions will be slow.

AlexanderRay commented 9 years ago

truncate?

andrii0lomakin commented 9 years ago

truncate class ... command

AlexanderRay commented 9 years ago

ok, I see, thanks)

andrii0lomakin commented 9 years ago

Could you try now ? https://www.dropbox.com/s/c7l62cd7fgdshy7/orientdb-community-2.0.2-SNAPSHOT-distribution.zip?dl=0

AlexanderRay commented 9 years ago

Hi, now it works, but, as you say, after delete, its very slow. it is possible to restructure the database, so I can insert records at the same speed as before delete (for my purposes, I need sometimes to delete 80-90 percent from class records)?

AlexanderRay commented 9 years ago

how can I speed up my inserts. You say it is possible to make till to 200.000 inserts per second on common hardware. I have I7, ssd, 16GB ram, but the maximum speed is about 11.000 records per second ... can you upload some benchmark test with 200.000 inserts per second, so I can test it on my hardware?

andrii0lomakin commented 9 years ago

Alexander, Do not do delete all use truncate, and it will not be slow.

On Sun, Feb 8, 2015 at 10:26 PM, Alexander Ray notifications@github.com wrote:

how can I speed up my inserts. You say it is possible to make till to 200.000 inserts per second on common hardware. I have I7, ssd, 16GB ram, but the maximum speed is about 11.000 records per second ... can you upload some benchmark test with 200.000 inserts per second, so I can test it on my hardware?

— Reply to this email directly or view it on GitHub https://github.com/orientechnologies/orientdb/issues/3517#issuecomment-73429701 .

Best regards, Andrey Lomakin.

AlexanderRay commented 9 years ago

I see,

  1. but what can I do, if I need to delete not all of the records, but only 80 or 90 percent of them? should I use "truncate record"?
  2. is it possible to restructure the database after "massive delete"? may be with backup/restore?
  3. what is about a "speed-insert" example?
andrii0lomakin commented 9 years ago

Hi

  1. We are developing new cluster design to overcome this issue, I think it will take about 1.5 month to release it.
  2. You can do it by export/import.
  3. Sorry did not get it.

On Mon, Feb 9, 2015 at 12:39 PM, Alexander Ray notifications@github.com wrote:

I see,

  1. but what can I do, if I need to delete not all of the records, but only 80 or 90 percent of them? should I use "truncate record".
  2. is it possible to restructure the database after "massive delete"? may be with backup/restore?
  3. what is about a "speed-insert" example?

— Reply to this email directly or view it on GitHub https://github.com/orientechnologies/orientdb/issues/3517#issuecomment-73488165 .

Best regards, Andrey Lomakin.

lvca commented 9 years ago

Alexander, you could create multiple clusters and just drop the clusters the contain the data you want to drop. This would be super fast and efficient. What kind of data do you have? How you filter the 80/90% of data you delete?

AlexanderRay commented 9 years ago

Thanks, it is a good idea to use multiple clusters for storing data and to drop it by cluster. I will try it so.

AlexanderRay commented 9 years ago

You write that orientdb can store till to 220.000 records per second on common hardware. What type of records? which API? Have you an example what achieve that insert speed on yours hardware?

lvca commented 9 years ago

220k as massive insertion in multi-threads, no wal, no index and Document API with documents with 6 fields.

AlexanderRay commented 9 years ago

plocal or remote?

lvca commented 9 years ago

plocal.

AlexanderRay commented 9 years ago

thanks for your advise)

AlexanderRay commented 9 years ago

just for info, here are my multithread test results:

screen shot 2015-02-09 at 15 12 11

saritseal commented 8 years ago

Hi,

The OOM error does not seem to be fixed yet. I am using 2.2-alpha. Can you please point me to the JIRA for this issue

Regards Sarit

lvca commented 8 years ago

Try getting last snapshot: https://oss.sonatype.org/content/repositories/snapshots/com/orientechnologies/orientdb-community/2.2.0-SNAPSHOT/

nishantkumar1292 commented 7 years ago

Hi @lvca .... I am getting this error when inserting records in multiple threads. I am using version 2.2.21.

505 HTTP Version Not Supported: {
  "errors": [
    {
      "code": 505,
      "reason": 505,
      "content": "java.lang.OutOfMemoryError: GC overhead limit exceeded"
    }
  ]
}

Is the issue fixed? Any ideas how can I get this resolved.

andrii0lomakin commented 7 years ago

@nishantkumar1292 OOM is to general issue and may be caused by many factors do you have heap dump generated by OOM in server directory?

nishantkumar1292 commented 7 years ago

@laa I have no idea about the heap dump. Where can I find the heap dump?

I am also getting Request Timeout for multiple queries. Is this a consequence of the above error? I am running more than 1500 threads in parallel which are either performing one of the CREATE, READ and UPDATE operations.

I also saw the command INVALIDATE_ALL to remove all the query results at every Create, Update, and Delete operation. This is faster than PER_CLUSTER if many writes occur. Will this help?

andrii0lomakin commented 7 years ago

@nishantkumar1292 how many cores do you have? Yes, that is possible that all memory was consumed by the handling of temporary data generated during query processing? Do you use command cache? Could you switch it off? About heap dump, it should be in server or bin directory and have .hprof extension.

nishantkumar1292 commented 7 years ago

I have 8 cores and they are all being used at around 98-99%. Yes...found the heap dump file. No not using command cache. Below is the command.cache.json file.

{
  "enabled": false,
  "evictStrategy": "INVALIDATE_ALL",
  "minExecutionTime": 10,
  "maxResultsetSize": 500
}

Also, after force stopping the threads the cores usage drops to 1-3%, thus suggesting that the cores are used by orientDB.

andrii0lomakin commented 7 years ago

@nishantkumar1292 1500 threads too much for 8 cores you will have a lot of context switches and as result bigger memory consumption and worse performance, in reality, big CPU usage does not mean better performance. Amount of threads is up to you of course, could you send me this file I will check it but I am on 99% sure that is caused by big memory consumption of queries

nishantkumar1292 commented 7 years ago

Yes you are right..... decreasing the number of threads improved the performance and also eliminated Request Timeout. What is the maximum number of threads I can work with. Each thread does some CREATE, READ and UPDATE operations.

Also what should be the ideal orientDB configuration to handle this kind of load?

andrii0lomakin commented 7 years ago

@nishantkumar1292 it is hard to say, but my preference is not more than 10 for a single core. ok, let say 20. so in our case, it should be 160 but not 1500 it is an enormous number. You can limit the number of threads by handling user requests in the thread pool.

nishantkumar1292 commented 7 years ago

What do you mean by handling user requests in thread pool?

andrii0lomakin commented 7 years ago

I do not know how your architecture I just proposed that you accept HTTP request, for example, parse them and execute a command. So commands may be executed in a separate thread pool.

nishantkumar1292 commented 7 years ago

Was running scripts on 100 parallel threads. The server stopped with this error.

Error on client connection
java.lang.OutOfMemoryError: GC overhead limit exceeded
$ANSI{green {db=db_development}} Exception `3AD12824` in storage `db_development`
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "OrientDB ONetworkProtocolHttpDb listen at 0.0.0.0:2480-2490" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "OrientDB Write Cache Flush Task (db_development)" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "OrientDB Write Cache Flush Task (db_development)"

@laa ...any ideas how to resolve this? Here is the screen-shot of the htop output on the OrientDB server: 150118448173077

Why is the java process asking for 13.9GB of virtual memory?

Also the heap-dump generated filled my disk space. Can they be of any help or should I remove them to free disk space.

nishantkumar1292 commented 7 years ago

Hi @laa ...any updates or resolution for the above issue?