orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.76k stars 871 forks source link

data loading performance seriously degrading if cluster/index files become very big #7135

Closed rdelangh closed 7 years ago

rdelangh commented 7 years ago

OrientDB Version, operating system, or hardware.

Operating System

We create classes per week of data, with 6 pre-created clusters per class. We store some 120M documents per such class, which translates into sizes of the cluster files on the server to about 20GB per cluster file. For example, for the class of week starting on Jan-22 2017:

% ls -l *20170122*pcl
-rw-rw-r-- 1 orientdb orientdb 21713257472 Jan 31 10:53 cl_cdr_eno_20170122_p0.pcl
-rw-rw-r-- 1 orientdb orientdb 21713257472 Jan 31 10:53 cl_cdr_eno_20170122_p1.pcl
-rw-rw-r-- 1 orientdb orientdb 21713060864 Jan 31 10:41 cl_cdr_eno_20170122_p2.pcl
-rw-rw-r-- 1 orientdb orientdb 21713257472 Jan 31 10:43 cl_cdr_eno_20170122_p3.pcl
-rw-rw-r-- 1 orientdb orientdb 21713060864 Jan 31 10:41 cl_cdr_eno_20170122_p4.pcl
-rw-rw-r-- 1 orientdb orientdb 21713454080 Jan 31 10:53 cl_cdr_eno_20170122_p5.pcl

Similarly the index files are obviously also fairly big:

% ls -l idx*20170122*
-rw-rw-r-- 1 orientdb orientdb        1024 Jan 21 13:43 idx_cdr_eno_20170122_0.nbt
-rw-rw-r-- 1 orientdb orientdb 40604402688 Jan 31 10:57 idx_cdr_eno_20170122_0.sbt

Compared to the files for class with data that started only 2 days ago on Jan-29 2017:

% ls -l *20170129*pcl
-rw-rw-r-- 1 orientdb orientdb 3291808768 Jan 30 17:03 cl_cdr_eno_20170129_p0.pcl
-rw-rw-r-- 1 orientdb orientdb 3291874304 Jan 30 16:50 cl_cdr_eno_20170129_p1.pcl
-rw-rw-r-- 1 orientdb orientdb 3291874304 Jan 30 16:51 cl_cdr_eno_20170129_p2.pcl
-rw-rw-r-- 1 orientdb orientdb 3291874304 Jan 30 16:51 cl_cdr_eno_20170129_p3.pcl
-rw-rw-r-- 1 orientdb orientdb 3291874304 Jan 30 17:02 cl_cdr_eno_20170129_p4.pcl
-rw-rw-r-- 1 orientdb orientdb 3291808768 Jan 30 17:02 cl_cdr_eno_20170129_p5.pcl
% ls -l idx*20170129*
-rw-rw-r-- 1 orientdb orientdb       1024 Jan 22 00:09 idx_cdr_eno_20170129_0.nbt
-rw-rw-r-- 1 orientdb orientdb 6029509632 Jan 30 17:03 idx_cdr_eno_20170129_0.sbt

The loading of extra documents in such a class, hence with fairly large cluster or index files, is going an order of magnitude slower compared to loading of documents into a class which has not many documents yet and consequently whose files are still much smaller:

Anything that can be tuned on this?

Obviously, the cluster files can be reduced in size by creating more clusters per class, so less data per cluster. This would break our cluster-sharding logic, would imply a lot of changes, but it is not impossible. About the index files: this is internal ODB business, where only a single file contains an entire index, and hence will always grow very big.

All suggestions are very welcome!

andrii0lomakin commented 7 years ago

@rdelangh I suppose the problem not in size of a cluster but in general size of a database. What is size of while database in GB what is size of RAM which is used for disk cache ?

andrii0lomakin commented 7 years ago

The next question does you load data in clusters only with indexes, or you load them in a separate manner without indexes? One more question what is your pattern of a load of data do you load them like one cluster by one cluster or many clusters are loaded at the same time.

rdelangh commented 7 years ago

@laa Q1: size of this database is more or less 500GB Q2: size of RAM: 128GB Q3: size of disk cache: if you mean the parameter "storage.diskCache.bufferSize": it is set to "32000" Q4: data is being loaded while the indexes exist, because there is also a unique index which needs to avoid that we load duplicates Q5: we load data into the class, not into specific clusters; so I assume that ODB is spreading the data automagically into the 6 clusters of the class

We are now rebuilding the unique index of this class, because we did a lot of deletes during the past days (issues with ODB, so that data loading encountered many timeouts after partial load jobs, hence new loads fails because of duplicates that first need to be deleted). I suspect that the unique index file (the one of about 40GB size) has many 'gaps' in it which cause the very slow loading now of new records.

andrii0lomakin commented 7 years ago

Hi, I suggest you do following as the first thing. Increase parameter "storage.diskCache.bufferSize" from 32000 to 82000 and I think you will see big difference if of course you use dedicated server.

Any way what was the reason to set buffer size manually?

rdelangh commented 7 years ago

@laa thx for you comments.

The buffer size was already adjusted during our first (and failing) attempts to get such a big database running with ODB, as per comments from your collegue fellows ;-)

The servers are dedicated, but we need to reserve a good chunk of its RAM for the ZFS filesystem manager caching that we have running. We can raise further the ODB buffersize to -say- 45% of total RAM and leave another 45% for the ZFS caching.

The index rebuild is now busy, we will wait first on that completion to see if it has any effect on performance.

rdelangh commented 7 years ago

hmm ... :

-> what's going on?

This way, index rebuilding is worthless. So I stop it, and will trash the index altogether, then build it new. See how that goes...

Grr... unpredictable program.

andrii0lomakin commented 7 years ago

@rdelangh please do not do it, just send me thread dump, I suppose you just reached free space limit for the program

andrii0lomakin commented 7 years ago

either wal limit

andrii0lomakin commented 7 years ago

@rdelangh could you also do me a favour and run the same test on ycsb_freeze branch, may be all issue which we are discussing already are fixed and we just need to move them to main branch

rdelangh commented 7 years ago

@laa

andrii0lomakin commented 7 years ago

@rdelangh I suppose you meant memory dump which may be few Gb. .Thread dumps usually consume few kb at max. The thread dump is a set of text lines which contain information about a current position of execution of each thread. Could you take it by calling of jstack pid command?

rdelangh commented 7 years ago

ok, thread dump in attach. odb-stackdump.txt

andrii0lomakin commented 7 years ago

@rdelangh From what I see you use only two threads to build an index, the rest of threads in "wait" state. You may increase the speed of load if you will build independent clusters in parallel. From thread dump all seems ok, threads are running and handling requests is thread rebuild still halt? What I only see is that index always working with disk so speed of rebuild is very low not fast. What is the size of index ?

andrii0lomakin commented 7 years ago

Oh, I see sorry I missed it. So the size of the index is 27 GB, but your disk cache size is about 91 GB of RAM. I see now. We can create command to put all index content to ram or almost all to rebuild it faster

andrii0lomakin commented 7 years ago

@rdelangh so I propose following solution we may give index higher priority in cache for the time while they will be rebuilt. So is it ok for you if you will create clusters first and then rebuild indexes? Did you try to create clusters without indexes? What was a speed of load in such case?

andrii0lomakin commented 7 years ago

@rdelangh btw what is the type the keys of index?

rdelangh commented 7 years ago

@laa thx for your follow-up

What you mean with the remark "we may give index higher priority in for the time while they will be rebuilt."?

The keys of this unique index: ENV, ORIGTERMMSCID, CALLTRANSACTIONTYPE, SERVEDIMSI, PRT_DATE, OTHERPARTYNUMBER, CALLDURATION, EXCHANGEID, OTHERPARTYSMS, ACCESSORYNUMBER, SEQUENCENUMBER All these are STRING, except PRT_DATE which is a DATETIME

andrii0lomakin commented 7 years ago

@rdelangh I see. Index rebuild is so slow because you randomly fetch pages from index file. So they treated as cold by a cache (because each page is fetched only a few times) and cache does not want to put it in memory. But I need to understand everything completely.

Is it correct that you increased the size of disk cache but did not try to make new load instead of that you tried to rebuild an index, do not you?

Did you rebuild index once new server is started or you rebuild the index after you made several other operations on an index and then started index rebuild?

andrii0lomakin commented 7 years ago

@rdelangh one more question what alphabet do you use for your keys, is it Latin alphabet?

andrii0lomakin commented 7 years ago

and one more question, do you use tree indexes or hash indexes do you need to sort your keys ?

rdelangh commented 7 years ago

@laa Q1: Is it correct that you increased the size of disk cache but did not try to make new load instead of that you tried to rebuild an index, do not you? A1: no, the disk cache size was set like this during our installation of the software, and has not been changed since Q2: Did you rebuild index once new server is started or you rebuild the index after you made several other operations on an index and then started index rebuild? A2: the latter: many (very fast) load jobs were done, many records deleted, then again (very slow) load jobs; then we stopped all activity, and launch the "rebuild" operation from the web gui Q3: what alphabet do you use for your keys, is it Latin alphabet? A3: yes, Latin, normal ASCII, no special characters (such as "é" or "ç") Q4: do you use tree indexes or hash indexes do you need to sort your keys ? A4: the unique index that we are trying to rebuild now, is a (SBTREE) UNIQUE index:

CREATE INDEX idx_cdr_eno_20170122_0 ON cdr_eno_20170122(ENV, ORIGTERMMSCID, CALLTRANSACTIONTYPE, SERVEDIMSI, PRT_DATE, OTHERPARTYNUMBER, CALLDURATION, EXCHANGEID, OTHERPARTYSMS, ACCESSORYNUMBER, SEQUENCENUMBER) UNIQUE
andrii0lomakin commented 7 years ago

@rdelangh if your value for "storage.diskCache.bufferSize" equals to 32 000 why do you think that disk cache consumes 92 GB? What is your value for heap memory? The second question why you do not use hash indexes, do you need to perform range queries?

rdelangh commented 7 years ago

@laa I did not write that the 92GB is "disk cache", but instead that it is "cache". At this moment, it is some 48GB:

% vmstat 4

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  2 302036 42384032  11068 49863356    0    0   234   105    2    1  1  0 97  2  0
 0  2 302036 42395128  11068 49852360    0    0 16532   924 2280 7057  0  0 96  4  0
...

Yes, range queries are needed, so SBTREE index.

andrii0lomakin commented 7 years ago

@rdelangh what is your settings for heap memory then ?

andrii0lomakin commented 7 years ago

@rdelangh I suggest you increase disk cache and decrease amount of heap memory accordingly. Also, we will provide you with an implementation of optional compression of index keys, so you will be able to cache more memory. Also about slow index rebuild. You made many actions, so disk cache already gathered statistics about hot pages. Because index pages which you are going to rebuild are accessed randomly, and very rarely, disk cache treats them as cold pages and do not want to cache it. But the bigger amount of memory for disk cache may also help in this situation. So, in general, I wait for your feedback after you increase disk cache size on 45%. Meanwhile, once we start to work on this issue, we implement string compression for you. So indexes will consume less space and will be frequently cached.

rdelangh commented 7 years ago

@laa ok, thx for the feedback. I changed the params as follows: -Xms1G -Xmx8G -Dstorage.diskCache.bufferSize=48000

Stopped the server, then started it again. Logs say now this:

2017-02-01 16:42:41:900 INFO  [OVariableParser.resolveVariables] Error on resolving property: distributed [orientechnologies]
2017-02-01 16:42:41:903 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServ
erSideScriptInterpreter]
2017-02-01 16:42:41:905 INFO  OrientDB Studio available at http://10.49.69.190:2480/studio/index.html [OServer]
2017-02-01 16:42:41:905 INFO  OrientDB Server is active v2.2.15 (build 2.2.x@rebf7b80c08613e871633edeb39c5c985e26b176c; 2017-01-19 07:39:00+0000). [OSe
rver]
2017-02-01 16:43:05:182 WARNI {db=mobile} Storage 'mobile' was not closed properly. Will try to recover from write ahead log [OLocalPaginatedStorage]
2017-02-01 16:43:05:182 INFO  {db=mobile} Looking for last checkpoint... [OLocalPaginatedStorage]
2017-02-01 16:43:05:183 INFO  {db=mobile} FULL checkpoint found. [OLocalPaginatedStorage]
2017-02-01 16:43:05:183 WARNI {db=mobile} FULL checkpoint has not completed. [OLocalPaginatedStorage]
2017-02-01 16:43:05:184 WARNI {db=mobile} Restore will start from the previous checkpoint. [OLocalPaginatedStorage]
2017-02-01 16:43:05:185 INFO  {db=mobile} Data restore procedure from FUZZY checkpoint is started. [OLocalPaginatedStorage]
2017-02-01 16:43:05:188 INFO  {db=mobile} 1 operations were processed, current LSN is LSN{segment=31008, position=16} last LSN is LSN{segment=31024, po
sition=85521596} [OLocalPaginatedStorage]
2017-02-01 16:43:05:189 WARNI {db=mobile} Non tx operation was used during data modification we will need index rebuild. [OLocalPaginatedStorage]
2017-02-01 16:43:35:194 INFO  {db=mobile} 9795 operations were processed, current LSN is LSN{segment=31008, position=2640565} last LSN is LSN{segment=3
1024, position=85521596} [OLocalPaginatedStorage]
...
2017-02-01 17:51:50:444 INFO  {db=mobile} 2110000 operations were processed, current LSN is LSN{segment=31012, position=28131746} last LSN is LSN{segme
nt=31024, position=85521596} [OLocalPaginatedStorage]
2017-02-01 17:51:57:233 WARNI {db=mobile} Record OFuzzyCheckpointStartRecord{lsn=LSN{segment=31012, position=29736269}} com.orientechnologies.orient.co
re.storage.impl.local.paginated.wal.OFuzzyCheckpointStartRecord{lsn=null, previousCheckpoint=LSN{segment=31011, position=97153164}} will be skipped dur
ing data restore [OLocalPaginatedStorage]
2017-02-01 17:51:57:404 WARNI {db=mobile} Record com.orientechnologies.orient.core.storage.impl.local.paginated.wal.OFuzzyCheckpointEndRecord{lsn=LSN{s
egment=31012, position=29841942}} will be skipped during data restore [OLocalPaginatedStorage]
2017-02-01 17:52:02:118 INFO  {db=mobile} 2120000 operations were processed, current LSN is LSN{segment=31012, position=30845999} last LSN is LSN{segme
nt=31024, position=85521596} [OLocalPaginatedStorage]
...

(still busy, will report again when completed)

rdelangh commented 7 years ago

update:

...
2017-02-01 18:38:46:953 INFO  {db=mobile} 8390000 operations were processed, current LSN is LSN{segment=31024, position=83146427} last LSN is LSN{segment=31024,
 position=85521596} [OLocalPaginatedStorage]
2017-02-01 18:38:48:670 SEVER {db=mobile} Invalid WAL record type was passed class com.orientechnologies.orient.core.storage.impl.local.paginated.wal.OFileTrunc
atedWALRecord. Given record will be skipped. [OLocalPaginatedStorage]
2017-02-01 18:38:48:670 SEVER {db=mobile} Invalid WAL record type was passed class com.orientechnologies.orient.core.storage.impl.local.paginated.wal.OFileTrunc
atedWALRecord. Given record will be skipped. [OLocalPaginatedStorage]
2017-02-01 18:38:48:671 WARNI {db=mobile} Record com.orientechnologies.orient.core.storage.impl.local.paginated.wal.OFullCheckpointStartRecord{lsn=LSN{segment=3
1024, position=85521573}, previousCheckpoint=LSN{segment=31024, position=77850119}} will be skipped during data restore [OLocalPaginatedStorage]
2017-02-01 18:38:48:671 INFO  {db=mobile} Storage data recover was completed [OLocalPaginatedStorage]
2017-02-01 19:06:25:810 INFO  {db=mobile} Wait till indexes restore after crash was finished. [OIndexManagerShared]
2017-02-01 19:06:25:948 INFO  {db=mobile} Start creation of index 'OFunction.name' [OIndexManagerShared$RecreateIndexesTask]
...

and

% grep 'rebuild is going' /opt/orientdb/orientdb-community-2.2.15/log/orient-server.log.0
2017-02-01 19:06:26:056 INFO  {db=mobile} Index 'OFunction.name' was successfully created and rebuild is going to be started [OIndexManagerShared$RecreateIndexe
sTask]
2017-02-01 19:06:32:482 INFO  {db=mobile} Index 'idx_cdr_eno_20161218_0' was successfully created and rebuild is going to be started [OIndexManagerShared$Recrea
teIndexesTask]
2017-02-01 22:07:21:218 INFO  {db=mobile} Index 'ORole.name' was successfully created and rebuild is going to be started [OIndexManagerShared$RecreateIndexesTas
k]
2017-02-01 22:09:29:476 INFO  {db=mobile} Index 'idx_cdr_eno_20161225_0' was successfully created and rebuild is going to be started [OIndexManagerShared$Recrea
teIndexesTask]
2017-02-02 00:31:20:772 INFO  {db=mobile} Index 'idx_cdr_eno_20170108_0' was successfully created and rebuild is going to be started [OIndexManagerShared$Recrea
teIndexesTask]
2017-02-02 05:20:46:894 INFO  {db=mobile} Index 'idx_cdr_eno_20170108_1' was successfully created and rebuild is going to be started [OIndexManagerShared$Recrea
teIndexesTask]
2017-02-02 06:44:31:174 INFO  {db=mobile} Index 'idx_cdr_eno_20161218_2' was successfully created and rebuild is going to be started [OIndexManagerShared$Recrea
teIndexesTask]
2017-02-02 09:40:33:389 INFO  {db=mobile} Index 'idx_cdr_eno_20170101_0' was successfully created and rebuild is going to be started [OIndexManagerShared$Recrea
teIndexesTask]

Now it is rebuilding ALL of the indexes of ALL databases, apparently !! This will take days!

rdelangh commented 7 years ago

Done. Attempting restart...

rdelangh commented 7 years ago

-> so somehow, something inside ODB gets saturated or accumulated into a situation where ODB can no longer process input data. -> how can this be identified?

rdelangh commented 7 years ago

And now we seem to get one more problem in this never ending saga:

trying to query directly an index, and this gives an error:

select from INDEX:idx_cdr_eno_20170122_1

Gives "java.lang.NullPointerException"

In the server logs:

Internal server error:

java.lang.NullPointerException
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.fetchEntriesFromIndexCursor(OCommandExecutorSQLSelect.java:2466)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.searchInIndex(OCommandExecutorSQLSelect.java:2780)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.assignTarget(OCommandExecutorSQLSelect.java:532)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearch(OCommandExecutorSQLSelect.java:512)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.execute(OCommandExecutorSQLSelect.java:488)
        at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:74)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.executeCommand(OAbstractPaginatedStorage.java:2592)
        at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.command(OAbstractPaginatedStorage.java:2538)
        at com.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:69)
        at com.orientechnologies.orient.server.network.protocol.http.command.post.OServerCommandPostCommand.execute(OServerCommandPostCommand.java:106)
        at com.orientechnologies.orient.graph.server.command.OServerCommandPostCommandGraph.execute(OServerCommandPostCommandGraph.java:37)
        at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.service(ONetworkProtocolHttpAbstract.java:169)
        at com.orientechnologies.orient.server.network.protocol.http.ONetworkProtocolHttpAbstract.execute(ONetworkProtocolHttpAbstract.java:622)
        at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:77)

Indexes corrupted, again ??

rdelangh commented 7 years ago

this failing index is a LUCENE index; querying another SBTREE UNIQUE index of the same class works fine.

andrii0lomakin commented 7 years ago

@rdelangh could you create a separate entry for Lucene index. About this saturation, it can be identified by making of thread dump during such saturation. Could you make such one ?

andrii0lomakin commented 7 years ago

@rdelangh about this saturation I suppose it is already fixed in a new version but let us check it.

wolf4ood commented 7 years ago

@rdelangh about this

select from INDEX:idx_cdr_eno_20170122_1

it's not supported by the lucene index. You can use it with a filter or count

select from INDEX:idx_cdr_eno_20170122_1 where key ..
select count(*) from INDEX:idx_cdr_eno_20170122_1

Since it's an inverted index, the index iterator has not been implemented

andrii0lomakin commented 7 years ago

@maggiolo00 I think we should provide a meaningful exception instead of NPE. Could you implement it as a separate issue?

wolf4ood commented 7 years ago

@laa done

https://github.com/orientechnologies/orientdb/issues/7156

rdelangh commented 7 years ago

The program which is loading data into dbase "mobile" suffers from these timeouts again.

So I attempted a stack trace:

# /usr/lib/jvm/jdk1.8.0_92/bin/java 29671 > /tmp/29671-jstack.txt
29671: Unable to open socket file: target process not responding or HotSpot VM not loaded
The -F option can be used when the target process is not responding
# /usr/lib/jvm/jdk1.8.0_92/bin/jstack -F 29671 > /tmp/29671-jstack.txt

File attached: 29671-jstack.txt

rdelangh commented 7 years ago

That data loading into dbase "mobile" is now blocking entirely, while another data loading into dbase "cdrarch" is happily proceeding, so I tried another stack trace:

# /usr/lib/jvm/jdk1.8.0_92/bin/jstack -F 29671 > /tmp/29671-b-jstack.txt
java.lang.NullPointerException
        at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:88)
        at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
        at sun.jvm.hotspot.tools.JStack.run(JStack.java:66)
        at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
        at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
        at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
        at sun.tools.jstack.JStack.main(JStack.java:106)

-> fairly stuck ?

rdelangh commented 7 years ago

meamwhile, the system is not out-of-memory:

# free -m
              total        used        free      shared  buff/cache   available
Mem:         128809       71604       54121          25        3082       55714
Swap:        130943           0      130943
# vmstat 4
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  2      0 55387960  86404 3089596    0    0   908 18942 5330 10927  1  1 94  4  0
 1  3      0 55434460  86404 3089500    0    0   856 15413 4692 7787  1  1 94  4  0
 1  2      0 55444076  86404 3067696    0    0   902 13750 6073 7681  1  1 94  4  0
 0  2      0 55442068  86404 3068956    0    0  1199  6135 3593 4776  1  0 95  4  0

# ps -aefl 
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 S orientdb 29671     1 61  80   0 - 38739983 futex_ 13:04 pts/4 01:36:31 java -server -Xms1G -Xmx8G -Djna.nosys=true -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=512g -Djava.awt.headless=true -Dfile.encoding=UTF8 -Drhino.opt.level=9 -Dstorage.openFiles.limit=1024 -Denvironment.dumpCfgAtStartup=true -Dquery.parallelAuto=true -Dquery.parallelMinimumRecords=50000 -Dstorage.wal.syncOnPageFlush=false -Dstorage.diskCache.bufferSize=54000 -Dmemory.chunk.size=33554432 -Djava.util.logging.config.file=/opt/orientdb/orientdb-community-2.2.15/config/orientdb-server-log.properties -Dorientdb.config.file=/opt/orientdb/orientdb-community-2.2.15/config/orientdb-server-config.xml -Dorientdb.www.path=/opt/orientdb/orientdb-community-2.2.15/www -Dorientdb.build.number=2.2.x@rebf7b80c08613e871633edeb39c5c985e26b176c; 2017-01-19 07:39:00+0000 -cp /opt/orientdb/orientdb-community-2.2.15/lib/orientdb-server-2.2.15.jar:/opt/orientdb/orientdb-community-2.2.15/lib/*:/opt/orientdb/orientdb-community-2.2.15/plugins/* com.orientechnologies.orient.server.OServerMain

Quite some open files:

# lsof -p 29671| grep /opt |wc -l
4315
andrii0lomakin commented 7 years ago

@rdelangh could you print and send me list of open files also according to stack trace which you sent to me you hit limit size, could you send me what is your WAL size now? Are you able to take thread dump now ?

rdelangh commented 7 years ago

@laa I have set "dumpCfgAtStartup = true", so at startup I get the following info about "wal" :

Not sure if this is the information about WAL you like to know; if you need other information then please add the instructions that I must use to obtain that information.

Thread dump: likely impossible to take because ODB server process has huge memory footprint (see above, approx 35GB)

rdelangh commented 7 years ago

And the open-files list in attach odb-29671-lsof.txt.gz

andrii0lomakin commented 7 years ago

@rdelangh thank you for a list of open files. The thread dump is not heap dump.The thread dump is a list of stack traces each of which points to the position of execution of a program in the single thread. You have already taken thread dump with a stack. I already looked at your thread dump, the first time OrientDB paused because of you reached a limit of WAL size and free space size that is why I asked about WAL limits. I suppose that you use default settings in which WAL max size is only 4GB once it reached this size synchronous flush of data is performed and as a result, you see pauses in execution. We already changed flush mechanics to avoid those pauses.It is fixed in https://github.com/orientechnologies/orientdb/tree/ycsb_freeze branch. But you need to wait for 3.0 to make load testing of it. I can create a build of the ycsb_freeze branch for you if you wish to check it. Meanwhile, to prevent pauses during load, you may increase max size of WAL (storage.wal.maxSize parameter defines a size of WAL in megabytes) till reasonable value let say 20-50 GB, depends on your disk size. Putting aside pauses how a performance of load changed after you increased disk cache size?

rdelangh commented 7 years ago

@laa The config-dump at startup lists these WAL settings (see above), which I have not changed in the "server.sh" script; I notice indeed this line

storage.wal.maxSize = 4096
andrii0lomakin commented 7 years ago

@rdelangh I updated comment above. What is current status of load , do you still have pause on frozen client ?

rdelangh commented 7 years ago

@laa I only keep the job-stream running which sends data to dbase "cdrarch", no concurrency (only one program is sending data); I have temporarily disabled the parallel running job-streams that send data to dbase "mobile"

-> I will wait till that first stream (most important) has recovered from its backlog; then I will do new attempts with that (blocking) second stream with parallel jobs

rdelangh commented 7 years ago

@laa thx for the follow-up!

andrii0lomakin commented 7 years ago

@rdelangh I suppose good one for you is about 16GB or a bit more 20GB what is your disk size? What is your DB size ?

rdelangh commented 7 years ago

@laa

andrii0lomakin commented 7 years ago

@rdelangh WAL is placed on a file system. In such case I suggest you to set WAL size to 50 GB which equals to "storage.wal.maxSize = 51200" it will substantially decrease pauses during data load.