Closed sajid2045 closed 9 years ago
We took it in charge and we are working on it. We will provide a feedback soon.
Hi @sajid2045,
I checked the dumps, i don't see any memory leak, the memory used is high but most of the memory could be garbage collected, also the most of that memory was allocated by the profiler and the debug logging.
Did you have any OutOfMemoryException ?
In any case is normal after a load test to have high memory usage, especially with profiling and logging enabled.
Hi ,
The requests which takes 30 ms was taking 6000ms. All I did was run 10 threads for 1 hour running about 18,000+ requests. The database had only 10,000 nodes so it not expected to behave that way at all. Even worse, the response time stayed at 4000 + ms even after I stopped the loadtest. I had to restart the DB and it went back to 30ms like before. So the memory was definitely not being collected.
This is really unacceptable as you can see, we are very unlikely to restart production DB.
-Sajid.
On Mon, Aug 31, 2015 at 8:27 PM, tglman notifications@github.com wrote:
Hi @sajid2045 https://github.com/sajid2045,
I checked the dumps, i don't see any memory leak, the memory used is high but most of the memory could be garbage collected, also the most of that memory was allocated by the profiler and the debug logging.
Did you have any OutOfMemoryException ?
In any case is normal after a load test to have high memory usage, especially with profiling and logging enabled.
— Reply to this email directly or view it on GitHub https://github.com/orientechnologies/orientdb/issues/4890#issuecomment-136326576 .
Hi @sajid2045,
I'll try to run the test for reproduce the delay, from the dump the strong retained memory is 81MB this should not be the cause of your delay, but i will check it.
I can see this behavior in the production. When I will run the import script with 100 000 records orientdb step by step will gain memory and the insert command is slower and slower
Hi,
The 2.1.1 is out, will be cool if you can try also with that version.
Checking the last code i saw that now you use every time no tx database, this is ok, but it's not suggested for creating edges between vertex, is it there any specific reason for that?
I can see this is a consistent behavior , I can run my load test for 2 hours and the orientdb will go into 100% memory and will not even respond sometime. The server stays in the same state even after i stop the loadtest and I have to restart it.
Also, After running over-night, I got this exception from client but interesting point is /usr/local/graphdb/default/databases/subscription-service/ is located on server and I am definitely using 'remote' to connect from client:
[2015-09-01 09:51:37,359] [get test] [ERROR] [org.mule.exception.AbstractExceptionListener:319] [logException] [serviceName:SubscriptionService]=>
Message : Failed to invoke au.com.foxsports.subscription.service.SubscriptionServiceImpl@6439a027. Message payload is of type: String Type : org.mule.api.MessagingException Code : MULE_ERROR-29999 Payload : test JavaDoc : http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/api/MessagingException.html
Exception stack is:
Root Exception stack trace: com.orientechnologies.common.concur.lock.OLockException: File '/usr/local/graphdb/default/databases/subscription-service/database.ocf' is locked by another process, maybe the database is in use by another process. Use the remote mode with a OrientDB server to allow multiple access to the same database. at com.orientechnologies.orient.core.storage.fs.OFileClassic.lock(OFileClassic.java:713) at com.orientechnologies.orient.core.storage.fs.OFileClassic.openChannel(OFileClassic.java:770) at com.orientechnologies.orient.core.storage.fs.OFileClassic.open(OFileClassic.java:552) at com.orientechnologies.orient.core.storage.impl.local.OSingleFileSegment.open(OSingleFileSegment.java:51) at com.orientechnologies.orient.core.storage.impl.local.OStorageConfigurationSegment.load(OStorageConfigurationSegment.java:64) at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:187) at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:249) at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:724) at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.openDatabase(ONetworkProtocolBinary.java:780) at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.executeRequest(ONetworkProtocolBinary.java:289) at com.orientechnologies.orient.server.network.protocol.binary.OBinaryNetworkProtocolAbstract.execute(OBinaryNetworkProtocolAbstract.java:223) at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:77)
Also, The client does not recover from a database reset, I had to restart the clients too!
Caused by: com.orientechnologies.common.io.OIOException: Error on connecting to fsasydgrhdb01.foxsports.com.au:2424/subscription-service
at com.orientechnologies.orient.client.remote.ORemoteConnectionManager.createNetworkConnection(ORemoteConnectionManager.java:246) ~[orientdb-client-2.1.0.jar:2.1.0]
at com.orientechnologies.orient.client.remote.ORemoteConnectionManager$1.createNewResource(ORemoteConnectionManager.java:80) ~[orientdb-client-2.1.0.jar:2.1.0]
at com.orientechnologies.orient.client.remote.ORemoteConnectionManager$1.createNewResource(ORemoteConnectionManager.java:77) ~[orientdb-client-2.1.0.jar:2.1.0]
at com.orientechnologies.common.concur.resource.OResourcePool.getResource(OResourcePool.java:94) ~[orientdb-core-2.1.0.jar:2.1.0]
at com.orientechnologies.orient.client.remote.ORemoteConnectionManager.acquire(ORemoteConnectionManager.java:101) ~[orientdb-client-2.1.0.jar:2.1.0]
at com.orientechnologies.orient.client.remote.OStorageRemote.getAvailableNetwork(OStorageRemote.java:2103) ~[orientdb-client-2.1.0.jar:2.1.0]
... 178 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.7.0_60]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) ~[?:1.7.0_60]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) ~[?:1.7.0_60]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) ~[?:1.7.0_60]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.7.0_60]
at java.net.Socket.connect(Socket.java:579) ~[?:1.7.0_60]
at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.
@tglman Hi,
"The 2.1.1 is out, will be cool if you can try also with that version.
Checking the last code i saw that now you use every time no tx database, this is ok, but it's not suggested for creating edges between vertex, is it there any specific reason for that?"
We have Topic <----- Subscriber and we are adding / removing subscribers from multiple threads. However, using a NoTx seems to avoid increasing the version of Topic and we are avoiding the concurrent modification error. If I bring back Tx, it increases the version and we see many concurrent modification exception in load tests.
Hi @all
Regarding the performances, i'm running your tests, i ran the test: SubscriptionServiceImplTest#testTopicSubscribers setting the number of subscriber to 100000, and after i extracted from your code this two query:
select in('subscribe').deviceId,in('ignore').out('user-device').deviceId from (select from Root where name = "cricket")
and
select in('subscribe').deviceId,in('subscribe').out('user-device').deviceId from (select from Root where name = "cricket")
i ran them against the server while the test was running and the response time was around 0.05 sec Do you have any other query to test ? do i've to run any other population test for reproduce the problem ?
for the error File '/usr/local/graphdb/default/databases/subscription-service/database.ocf' is locked by another process, maybe the database is in use by another process.
double check that the server process is fully terminated before run another server.
The use of notTx for graph is ok for batch operation like import, but it's not suggested to be used in a live application the reason is that the edge creation is a multi-record operation that need a transaction to guarantee the consistency. In case of concurrent modification exception your code should retry the operation.
One Important point, is that today the enterprise monitor show the amount of allocated heap on the machine compared to the max allocable heap(the one set with -Xmx), but not the actual amount of the used heap, so after a load test the jvm has allocated the 100% of possible heap and the monitor show that, but that heap may not be used, we are working to have the actual amount of used heap in the next release.
It's more important to find slowdown after the load test though.
@sajid2045 In general, have been observing very similar issues w.r.t nodes freezing and restarts at both client and remote server. This is useful, thanks.
On the other hand, we've used a different way of dealing with the edge addition changing vertex document version. There is a way to configure conflict strategy,
ALTER DATABASE CONFLICTSTRATEGY content
We've followed this since 1.7.* snapshot and later version and has been doing fairly ok.
You can read more about it here http://orientdb.com/docs/last/SQL-Alter-Database.html @tglman Please correct me if this is no more applicable.
Don't see this on 2.1.2 so far. Closing it.
Please find the latest code & heapdumps here:
https://dl.dropboxusercontent.com/u/5968302/orient-errors.zip
After running load tests for a long time, orient-server seems to reach 100% memory. I added the heap-dump etc for you to check.