Closed rdelangh closed 8 years ago
@rdelangh could you provide full log which you see during database open ?
@laa This issue is very much similar to what's described in #6426 ...
Full server logs do not provide any information :
2016-11-07 10:08:49:819 INFO Loading configuration from: /opt/orientdb/orientdb-community-2.2.13-SNAPSHOT/config/orientdb-server-config.xml... [OServerConfigurationLoaderXml]
2016-11-07 10:08:50:042 INFO OrientDB Server v2.2.13-SNAPSHOT (build UNKNOWN@rc28a52ffe68135c4f259e9030b75a8e8d41df8b3; 2016-10-31 22:29:54+0000) is starting up... [OServer]
2016-11-07 10:08:50:049 INFO Databases directory: /opt/orientdb/databases [OServer]
2016-11-07 10:08:50:381 INFO Listening binary connections on 0.0.0.0:2424 (protocol v.36, socket=default) [OServerNetworkListener]
2016-11-07 10:08:50:387 INFO Listening http connections on 0.0.0.0:2480 (protocol v.10, socket=default) [OServerNetworkListener]
2016-11-07 10:08:50:404 INFO Installing dynamic plugin 'studio-2.2.zip'... [OServerPluginManager]
2016-11-07 10:08:50:408 INFO ODefaultPasswordAuthenticator is active [ODefaultPasswordAuthenticator]
2016-11-07 10:08:50:409 INFO OServerConfigAuthenticator is active [OServerConfigAuthenticator]
2016-11-07 10:08:50:410 INFO OSystemUserAuthenticator is active [OSystemUserAuthenticator]
2016-11-07 10:08:50:433 INFO Installed GREMLIN language v.2.6.0 - graph.pool.max=50 [OGraphServerHandler]
2016-11-07 10:08:50:434 INFO [OVariableParser.resolveVariables] Error on resolving property: distributed [orientechnologies]
2016-11-07 10:08:50:437 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2016-11-07 10:08:50:439 INFO OrientDB Studio available at http://10.49.69.190:2480/studio/index.html [OServer]
2016-11-07 10:08:50:440 INFO OrientDB Server is active v2.2.13-SNAPSHOT (build UNKNOWN@rc28a52ffe68135c4f259e9030b75a8e8d41df8b3; 2016-10-31 22:29:54+0000). [OServer]
These are only the initial startup messages; when I then try to open the database from a "console.sh" session (see above, OP) then no more server messages are logged.
That other issue #6426 discusses in much extent the server parameter settings because there the OP could not startup the server process. I stress that I do not have any problem starting up the server process : my server parameters are proven ok, the server has been running for several months on previous OrientDB versions .
@laa server params:
ORIENTDB_OPTS_MEMORY="-Xms512m -Xmx10000m"
JAVA_OPTS_SCRIPT="-Djna.nosys=true -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=512g -Djava.awt.headless=true -Dfile.encoding=UTF8 -Drhino.opt.level=9"
ORIENTDB_SETTINGS="-Dstorage.diskCache.bufferSize=32000 -Dmemory.chunk.size=33554432
@laa Another of my databases ("navi", 32GB size) can be opened without problems, so this smells like a bad status flag or so in our 'big' database "cdrarch" (700GB size) :
orientdb> use remote:localhost/navi admin admin
Disconnecting from the database [null]...OK
Connecting to database [remote:localhost/navi] with user 'admin'...OK
orientdb {db=navi}>
@rdelangh is it possible for you to debug server and set a breakpoint in OStorageException and then send me full stack trace of exception and may be some parameters which I will ask for you? Also is it possible to send this database to me, so I will try to open it and reproduce the issue ? Is it possible to reproduce this issue on several environments or only on your server?
@laa
@rdelangh Well that is not good. Then we need to change the code and log all information which I need. By "several envinronments" I mean whether the same problem happens on server and your PC for example.
@laa I can install some Java programs on my server, but you will have to tell me which ones are needed to do this debugging. Eg I have "visualvm", but I don't see any way in this utility to set breakpoints in a running Java program.
Environments: no place on my PC to get this 700GB dbase copied...
similar, hence related issue(s): #6426 , #6853
@rdelangh could you open database using distribution https://drive.google.com/file/d/0B2oZq2xVp841UFU5OUtKOUh4bFU/view?usp=sharing and send me content of the server log and server console
@laa Did so, but nothing extra appeared in its server logs which was not yet in the server logs of the version I used before (see below). Also, please be carefull when the platform on which you build this distribution: your text files (such as "bin/server.sh" and "bin/console.sh", and all others) are clearly built on a Windows platform, because their lines end on CRLF instead of (UNIX convention) LF only !
Client (console.sh) output when succesfull connecting to medium sized (30GB) database "navi" :
$ sh bin/console.sh
OrientDB console v.2.2.13-SNAPSHOT (build UNKNOWN@rc28a52ffe68135c4f259e9030b75a8e8d41df8b3; 2016-10-31 22:29:54+0000) www.orientdb.com
Type 'help' to display all the supported commands.
Installing extensions for GREMLIN language v.2.6.0
orientdb> use remote:localhost/navi admin admin
Connecting to database [remote:localhost/navi] with user 'admin'...OK
orientdb {db=navi}>
and the same but failing to open the big (700GB) database "cdrarch":
$ sh bin/console.sh
OrientDB console v.2.2.13-SNAPSHOT (build UNKNOWN@rc28a52ffe68135c4f259e9030b75a8e8d41df8b3; 2016-10-31 22:29:54+0000) www.orientdb.com
Type 'help' to display all the supported commands.
Installing extensions for GREMLIN language v.2.6.0
orientdb> use remote:localhost/cdrarch admin $hiddenPwd
Connecting to database [remote:localhost/cdrarch] with user 'admin'...
Error: com.orientechnologies.orient.core.exception.OStorageException: Can not close file with id 0 because it is still in use
DB name="cdrarch"
DB name="cdrarch"
orientdb>
The server logs:
$ sh -x bin/server.sh
...
+ ORIENTDB_OPTS_MEMORY=-Xms512m -Xmx10000m
+ [ -z ]
+ JAVA_OPTS_SCRIPT=-Djna.nosys=true -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=512g -Djava.awt.headless=true - Dfile.encoding=UTF8 -Drhino.opt.level=9
+ [ -z ]
+ ORIENTDB_SETTINGS=-Dstorage.diskCache.bufferSize=32000
+ echo 2103
+ exec java -server -Xms512m -Xmx10000m -Djna.nosys=true -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=512g -Djav a.awt.headless=true -Dfile.encoding=UTF8 -Drhino.opt.level=9 -Dstorage.diskCache.bufferSize=32000 -Djava.util.logging.confi g.file=/opt/orientdb/orientdb-community-2.2.13-SNAPSHOT/config/orientdb-server-log.properties -Dorientdb.config.file=/opt/o rientdb/orientdb-community-2.2.13-SNAPSHOT/config/orientdb-server-config.xml -Dorientdb.www.path=/opt/orientdb/orientdb-com munity-2.2.13-SNAPSHOT/www -Dorientdb.build.number=2.2.x@rc37a58480dc60adb51f3bf5da4070299d5e59bf9; 2016-11-07 11:42:14+000 0 -cp /opt/orientdb/orientdb-community-2.2.13-SNAPSHOT/lib/orientdb-server-2.2.13-SNAPSHOT.jar:/opt/orientdb/orientdb-commu nity-2.2.13-SNAPSHOT/lib/*:/opt/orientdb/orientdb-community-2.2.13-SNAPSHOT/plugins/* com.orientechnologies.orient.server.O ServerMain
2016-11-07 13:19:31:285 INFO Loading configuration from: /opt/orientdb/orientdb-community-2.2.13-SNAPSHOT/config/orientdb- server-config.xml... [OServerConfigurationLoaderXml]
2016-11-07 13:19:31:521 INFO OrientDB Server v2.2.13-SNAPSHOT (build 2.2.x@rc37a58480dc60adb51f3bf5da4070299d5e59bf9; 2016 -11-07 11:42:14+0000) is starting up... [OServer]
2016-11-07 13:19:31:528 INFO Databases directory: /opt/orientdb/databases [OServer]
2016-11-07 13:19:31:904 INFO Listening binary connections on 0.0.0.0:2424 (protocol v.36, socket=default) [OServerNetworkL istener]
2016-11-07 13:19:31:907 INFO Listening http connections on 0.0.0.0:2480 (protocol v.10, socket=default) [OServerNetworkLis tener]
2016-11-07 13:19:31:912 WARNI Found ORIENTDB_ROOT_PASSWORD variable, using this value as root's password [OServer]
2016-11-07 13:19:32:296 INFO Installing dynamic plugin 'studio-2.2.zip'... [OServerPluginManager]
2016-11-07 13:19:32:303 INFO ODefaultPasswordAuthenticator is active [ODefaultPasswordAuthenticator]
2016-11-07 13:19:32:305 INFO OServerConfigAuthenticator is active [OServerConfigAuthenticator]
2016-11-07 13:19:32:306 INFO OSystemUserAuthenticator is active [OSystemUserAuthenticator]
2016-11-07 13:19:32:334 INFO Installed GREMLIN language v.2.6.0 - graph.pool.max=50 [OGraphServerHandler]
2016-11-07 13:19:32:334 INFO [OVariableParser.resolveVariables] Error on resolving property: distributed [orientechnologie s]
2016-11-07 13:19:32:337 WARNI Authenticated clients can execute any kind of code into the server by using the following all owed languages: [sql] [OServerSideScriptInterpreter]
2016-11-07 13:19:32:340 INFO OrientDB Studio available at http://10.49.69.190:2480/studio/index.html [OServer]
2016-11-07 13:19:32:340 INFO OrientDB Server is active v2.2.13-SNAPSHOT (build 2.2.x@rc37a58480dc60adb51f3bf5da4070299d5e5 9bf9; 2016-11-07 11:42:14+0000). [OServer]
@laa , @lvca hi Andrey, any suggestion what can be next to identify the reason of this exception? Is there anything I can do at my end to provide you with more info ? Can you identify which piece of code generates that output line for the "console.sh" :
Can not close file with id 0 because it is still in use
and in which conditions?
@laa
when I tried the first time to open this "cdrarch" database, we got no additional server log messages.
When I tried a second time again to open this same database, we got these server log messages:
2016-11-07 15:59:21:205 WARNI {db=cdrarch} Database is open by another process [OPaginatedStorageDirtyFlag]
2016-11-07 15:59:27:606 WARNI {db=cdrarch} MBean with name com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations:id=749408700,name="cdrarch",type=OAtomicOperationsMangerMXBean has already registered. Probably your system was not shutdown correctly or you have several running applications which use OrientDB engine inside [OAtomicOperationsManager]
@rdelangh that was my fault. Could you replace distribution by this one https://drive.google.com/file/d/0B2oZq2xVp841UFU5OUtKOUh4bFU/view?usp=sharing and try again. But please restart the OS.
@laa 1) still DOS-styles text files (such as server.sh) with CRLF at line-ends ... 2) still same problem : console:
orientdb> use remote:localhost/cdrarch admin $hiddenPwd
Disconnecting from the database [null]...OK
Connecting to database [remote:localhost/cdrarch] with user 'admin'...
Error: com.orientechnologies.orient.core.exception.OStorageException: Can not close file with id 0 because it is still in use
DB name="cdrarch"
DB name="cdrarch"
server logs (message appears only after 2nd attempt to open dbase):
2016-11-08 13:54:58:328 INFO Loading configuration from: /opt/orientdb/orientdb-community-2.2.13-SNAPSHOT/config/orientdb-server-config.xml... [OServerConfigurationLoaderXml]
2016-11-08 13:54:58:565 INFO OrientDB Server v2.2.13-SNAPSHOT (build 2.2.x@rc37a58480dc60adb51f3bf5da4070299d5e59bf9; 2016-11-07 11:42:14+0000) is starting up... [OServer]
2016-11-08 13:54:58:571 INFO Databases directory: /opt/orientdb/databases [OServer]
2016-11-08 13:54:58:984 INFO Listening http connections on 0.0.0.0:2480 (protocol v.10, socket=default) [OServerNetworkListener]
2016-11-08 13:54:59:001 INFO Installing dynamic plugin 'studio-2.2.zip'... [OServerPluginManager]
2016-11-08 13:54:59:008 INFO ODefaultPasswordAuthenticator is active [ODefaultPasswordAuthenticator]
2016-11-08 13:54:59:009 INFO OServerConfigAuthenticator is active [OServerConfigAuthenticator]
2016-11-08 13:54:59:011 INFO OSystemUserAuthenticator is active [OSystemUserAuthenticator]
2016-11-08 13:54:59:032 INFO Installed GREMLIN language v.2.6.0 - graph.pool.max=50 [OGraphServerHandler]
2016-11-08 13:54:59:033 INFO [OVariableParser.resolveVariables] Error on resolving property: distributed [orientechnologies]
2016-11-08 13:54:59:035 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2016-11-08 13:54:59:037 INFO OrientDB Studio available at http://10.49.69.190:2480/studio/index.html [OServer]
2016-11-08 13:54:59:038 INFO OrientDB Server is active v2.2.13-SNAPSHOT (build 2.2.x@rc37a58480dc60adb51f3bf5da4070299d5e59bf9; 2016-11-07 11:42:14+0000). [OServer]
2016-11-08 13:55:56:017 WARNI {db=cdrarch} Database is open by another process [OPaginatedStorageDirtyFlag]
2016-11-08 13:55:56:083 WARNI {db=cdrarch} MBean with name com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations:id=749856979,name="cdrarch",type=OAtomicOperationsMangerMXBean has already registered. Probably your system was not shutdown correctly or you have several running applications which use OrientDB engine inside [OAtomicOperationsManager]
@laa, in #6426 the OP mentions that his identical problem disappeared when he adjusted some memory options in "server.sh" ; can you please confirm that this is relevant to make the problem disappear?
I had already my current settings in the versions up till 2.2.12 where the server was happily running ; after some 2.2.13 versions, this problem started to appear. Can this be a hint to the cause of the issue?
@laa, @lvca any feedback please ? Can't be so difficult to find in your Java code where this message is printed, and under which conditions. -> what's happening ?
@rdelangh I was looking for a message "File is acquired from." I will provide next distribution which will write something like "patched version" during server start so you will be sure that distribution is correct. Also I will change code a bit to fix possible error.
@laa by the way, about the way of working in OrientDB: it is without any doubt a good practice to include some kind of locking mechanism in the programs, to avoid that two instances of the same program are trying to access/manipulate the same database files. So far about that.
What apparently should be enhanced, is the extra check after the lock-check : if a lock is found, it might also be a left-over from a previous instance of the program which failed to remove its lock when it terminated (gracefully or not). So a lock that is found, must be crossed-checked to see that the program which set the lock, is really still active. If not, then the lock is a left-over that can and should be ignored.
@rdelangh it is already done. I do not think that your issue is related to the fact that there are "leftovers" from the previous process.
@laa ok, good. Thx for your feedback. Any estimation when we could receive a corrected distribution that fixes this issue ?
@rdelangh you can download it right now https://drive.google.com/file/d/0B2oZq2xVp841UFU5OUtKOUh4bFU/view?usp=sharing . Please note that during server start you should see message "Patched version (#1) for rdelangh ..." .
@rdelangh any updates ? If you provide update quicker we will finish this issue sooner
@laa hey Andrey, I have been providing many and prompt updates so far ... Anyway thanks for your swift reaction.
I am trying to collect this new distribution, and install it on our servers in the next 30 mins or so; will instantly provide you feedback of the outcome.
@laa
1) the text files are still in DOS style (trailing CRLF characters), not UNIX style (LF only) !
2) got that version and installed it, startup proves this is your latest distribution for my environment:
2016-11-09 15:39:52:238 SEVER Patched version (#1) for rdelangh ... [OServer]
...
2016-11-09 15:39:53:614 INFO OrientDB Server is active v2.2.13-SNAPSHOT (build 2.2.x@rea9cd9af130882bf2836428b2d3a5bd4acde8da4; 2016-11-09 10:46:53+0000). [OServer]
3) try again to open our big database "cdrarch" with "console.sh" :
console.sh
OrientDB console v.2.2.13-SNAPSHOT (build UNKNOWN@rc28a52ffe68135c4f259e9030b75a8e8d41df8b3; 2016-10-31 22:29:54+0000) www.orientdb.com
Type 'help' to display all the supported commands.
Installing extensions for GREMLIN language v.2.6.0
orientdb> use remote:localhost/cdrarch admin $hiddenPwd
Connecting to database [remote:localhost/cdrarch] with user 'admin'...
Error: com.orientechnologies.orient.core.exception.OStorageException: Cannot open local storage '/opt/orientdb/databases/cdrarch' with mode=rw
DB name="cdrarch"
DB name="cdrarch"
Error: com.orientechnologies.orient.core.exception.OStorageException: File with name files_loaded1.pcl does not exist in storage cdrarch
DB name="cdrarch"
-> so, this issue where the database was considered to be open yet by another process is resolved, it seems -> we fall back now again to the issue #6854, where a class "files_loaded1" could no longer be accessed, but is still present in the schema of the database. That's what you get the above error from, apparently. -> how to fix that "files_loaded1" class status, and get the dbase fully active again ?
Hi @rdelangh but is file with name files_loaded1.pcl
really presented in storage ?
no it is not, cfr #6854
@rdelangh do you use distributed or single server instalation ?
@laa single server
@rdelangh and of curse you did not remove any files from server directory, did you ? How that happens any way, you updated system to 2.2.13 version and saddenly all stops to work, is it correct ?
@laa
@rdelangh I see, to fix this issue please create empty file of 1024 bytes size and put into database directory. I will think meanwhile how we may detect reason of such strange disaapearence of file on disk.
@laa 1) did create such file "files_loaded1.pcl" of 1024 bytes 2) started the server again 3) tried again to connect from client "console.sh" to the dbase "cdrarch", but got another error now:
% bin/console.sh
OrientDB console v.2.2.13-SNAPSHOT (build UNKNOWN@rc28a52ffe68135c4f259e9030b75a8e8d41df8b3; 2016-10-31 22:29:54+0000) www.orientdb.com
Type 'help' to display all the supported commands.
Installing extensions for GREMLIN language v.2.6.0
orientdb> use remote:localhost/cdrarch admin $hiddenPwd
Connecting to database [remote:localhost/cdrarch] with user 'admin'...
Error: com.orientechnologies.orient.core.exception.OStorageException: Cannot open local storage '/opt/orientdb/databases/cdrarch' with mode=rw
DB name="cdrarch"
DB name="cdrarch"
Error: java.lang.IllegalStateException: Item with key 744075995673264129 already exists
@rdelangh I think better to do following:
@rdelangh do you have any stack trace in server log ?
@laa 1) in dbase "GratefulDeadConcerts" :
create cluster files_loaded1
2) stopped the server engine 3) found the files for this cluster:
% ls -l databases/GratefulDeadConcerts/files_loaded1.*
-rw-rw-r-- 1 orientdb orientdb 66560 Nov 10 11:26 databases/GratefulDeadConcerts/files_loaded1.pcl
-rw-rw-r-- 1 orientdb orientdb 1024 Nov 10 11:26 databases/GratefulDeadConcerts/files_loaded1.cpm
4) copied them into "databases/cdrarch" , and started the server engine again 5) from client "console.sh", connected to "cdrarch" , but similar issue:
orientdb> use remote:localhost/cdrarch admin $hiddenPwd
Disconnecting from the database [null]...OK
Connecting to database [remote:localhost/cdrarch] with user 'admin'...
Error: com.orientechnologies.orient.core.exception.OStorageException: Cannot open local storage '/opt/orientdb/databases/cdrarch' with mode=rw
DB name="cdrarch"
DB name="cdrarch"
Error: java.lang.IllegalStateException: Item with key 1859729167109586945 already exists
@rdelangh what about the log files ? they are in logs directory on sever side
@laa no log messages from server after its startup messages:
...
2016-11-10 11:31:25:919 INFO OSystemUserAuthenticator is active [OSystemUserAuthenticator]
2016-11-10 11:31:25:947 INFO Installed GREMLIN language v.2.6.0 - graph.pool.max=50 [OGraphServerHandler]
2016-11-10 11:31:25:948 INFO [OVariableParser.resolveVariables] Error on resolving property: distributed [orientechnologies]
2016-11-10 11:31:25:951 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2016-11-10 11:31:25:954 INFO OrientDB Studio available at http://10.49.69.190:2480/studio/index.html [OServer]
2016-11-10 11:31:25:954 INFO OrientDB Server is active v2.2.13-SNAPSHOT (build 2.2.x@rea9cd9af130882bf2836428b2d3a5bd4acde8da4; 2016-11-09 10:46:53+0000). [OServer]
@tglman could you help us identify why we do not see exceptions send trough remote connection in a server log ? It is really hard to identify cause of exception in such case.
@laa, @tglman For what it is worth: I found where this log message is printed from the Java code: in "common/collection/closabledictionary/OClosableLinkedContainer.java" , which contains the
package com.orientechnologies.common.collection.closabledictionary;
the public method "add( )" is defined, from where the message gets printed:
public void add(K key, V item) {
if (!item.isOpen())
throw new IllegalArgumentException("All passed in items should be in open state");
final OClosableEntry<K, V> closableEntry = new OClosableEntry<K, V>(item);
final OClosableEntry<K, V> oldEntry = data.putIfAbsent(key, closableEntry);
if (oldEntry != null) {
throw new IllegalStateException("Item with key " + key + " already exists");
}
logAdd(closableEntry);
}
@rdelangh we found too but we can not understand why it happens
@laa , @tglman server logs, and much more verbosity in these logs would help a lot !
Maybe best to switch now further investigations and comments to issue #6854 ?
I think the better continue at the same context.
@laa That's ok. -> could you please make a distribution that produces (much) more logging while it is opening a database, especially around that Java method which spawns the error message to the client console?
@laa Andrey, what is the plan to get this sorted out ? Are you trying to reproduce ? Or will you make some special distribution bundle that will spawn server log messages around the offending piece of Java code? Or any other strategy ?
@rdelangh I will create a distribution which prints "normal" stack trace. So we will know how it happened. I will provide distribution tomorrow morning.
@rdelangh could you try this distribution. But in console script could you add property -DdebugMode=true
just after all other properties but before the class name?
@laa which distribution, please ?
@laa 1) can you please give me a URL where this 'special verbose' distribution can be downloaded? 2) a little context about this faulty class "files_loaded1" : it is a class which was renamed from original name "files_loaded"; after that rename, I created a new class "files_loaded"; the renamed class "files_loaded1" started giving problems ( #6854) after the server process was stopped and a new version was started
OrientDB Version, operating system, or hardware.
Operating System
Expected behavior and actual behavior
After running with 2.2.13, shutting the database server left it in a 'un-openable' state :
launching the server again (any snapshot version 2.2.13-*) and trying to open the dbase from console session, fails with
Have rebooted the operating system, still same problem.
-> server is down !