orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.75k stars 871 forks source link

Memory Issues with 32 Bit java #7586

Closed EricSchreiner closed 7 years ago

EricSchreiner commented 7 years ago

OrientDB Version: 2.2.23

Java Version: 1.8.0_131 32Bit

OS: Windows 10

Hi @lvca ,

When I use your recomended settings with a 32 Bit (1.8.0_131) Runtime I get the out of memory immmediately (with -XX:MaxDirectMemorySize=128m it just comes later)

Here are the relevant settings: VER @ 10:42:50.358 java.runtime.version: 1.8.0_131-b11 VER @ 10:42:50.358 java.version: 1.8.0_131 VER @ 10:42:50.358 java.vm.version: 25.131-b11 VER @ 10:42:50.358 java.vm.vendor: Oracle Corporation VER @ 10:42:50.358 java.vm.name: Java HotSpot(TM) Client VM VER @ 10:42:50.358 java.specification.version: 1.8 VER @ 10:42:50.358 java.vm.specification.version: 1.8 VER @ 10:42:50.359 os.name: Windows 10 VER @ 10:42:50.359 os.version: 10.0 VER @ 10:42:50.359 os.arch: x86 MSG @ 10:42:50.359 java.runtime totalMemory=16mb maxMemory=1037mb freeMemory=11mb processors=8 MSG @ 10:42:50.361 java.runtime.argument: -Xmx1024m MSG @ 10:42:50.361 java.runtime.argument: -XX:MaxDirectMemorySize=1G MSG @ 10:42:50.361 java.runtime.argument: -Dpicapport.home=C:\ProgramData\Contecon MSG @ 10:42:50.361 java.runtime.argument: -DTRACE=DEBUG

Here is the error during the start database MSG @ 10:42:52.372 PicApportDBService.createDatabaseDirectory: C:\Users\Eric.picapport\db MSG @ 10:42:52.373 PicApportDBService.startDatabase:plocal:C:/Users/Eric/.picapport/db/db.2.2.23 EXCEP@ ============================================================ EXCEP@ Exception at: 2017-07-26 10:42:52 EXCEP@ Msg: EXCEP@ null EXCEP@ ------------------------------------------------------------ EXCEP@ java.lang.OutOfMemoryError EXCEP@ at sun.misc.Unsafe.allocateMemory(Native Method) EXCEP@ at java.nio.DirectByteBuffer.(DirectByteBuffer.java:127) EXCEP@ at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) EXCEP@ at com.orientechnologies.common.directmemory.OByteBufferPool.allocateBuffer(OByteBufferPool.java:328) EXCEP@ at com.orientechnologies.common.directmemory.OByteBufferPool.acquireDirect(OByteBufferPool.java:279) EXCEP@ at com.orientechnologies.orient.core.storage.cache.local.OWOWCache.load(OWOWCache.java:769) EXCEP@ at com.orientechnologies.orient.core.storage.cache.local.twoq.O2QCache.updateCache(O2QCache.java:1107) EXCEP@ at com.orientechnologies.orient.core.storage.cache.local.twoq.O2QCache.doLoad(O2QCache.java:346) EXCEP@ at com.orientechnologies.orient.core.storage.cache.local.twoq.O2QCache.allocateNewPage(O2QCache.java:397) EXCEP@ at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperation.commitChanges(OAtomicOperation.java:434) EXCEP@ at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.endAtomicOperation(OAtomicOperationsManager.java:468) EXCEP@ at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.endAtomicOperation(OAtomicOperationsManager.java:412) EXCEP@ at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.endAtomicOperation(ODurableComponent.java:116) EXCEP@ at com.orientechnologies.orient.core.storage.impl.local.paginated.OPaginatedCluster.create(OPaginatedCluster.java:195) EXCEP@ at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.addClusterInternal(OAbstractPaginatedStorage.java:4136) EXCEP@ at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.doAddCluster(OAbstractPaginatedStorage.java:4117) EXCEP@ at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.create(OAbstractPaginatedStorage.java:459) EXCEP@ at com.orientechnologies.orient.core.storage.impl.local.paginated.OLocalPaginatedStorage.create(OLocalPaginatedStorage.java:127) EXCEP@ at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.create(ODatabaseDocumentTx.java:438) EXCEP@ at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.create(ODatabaseDocumentTx.java:398) EXCEP@ at de.contecon.picapport.db.PicApportDBService.createDBSchema(Unknown Source) EXCEP@ at de.contecon.picapport.db.PicApportDBService.startDatabase(Unknown Source) EXCEP@ at de.contecon.picapport.db.PicApportDBService.startDatabase(Unknown Source) EXCEP@ at de.contecon.picapport.PicApport.startDatabase(Unknown Source) EXCEP@ at de.contecon.picapport.PicApport.init(Unknown Source) EXCEP@ at de.contecon.picapport.PicApport.main(Unknown Source) EXCEP@ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) EXCEP@ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) EXCEP@ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) EXCEP@ at java.lang.reflect.Method.invoke(Method.java:498) EXCEP@ at com.sun.javafx.application.LauncherImpl.launchApplicationWithArgs(LauncherImpl.java:389) EXCEP@ at com.sun.javafx.application.LauncherImpl.launchApplication(LauncherImpl.java:328) EXCEP@ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) EXCEP@ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) EXCEP@ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) EXCEP@ at java.lang.reflect.Method.invoke(Method.java:498) EXCEP@ at sun.launcher.LauncherHelper$FXHelper.main(LauncherHelper.java:767)

andrii0lomakin commented 7 years ago

Hi @EricSchreiner what is your disk cache size?

EricSchreiner commented 7 years ago

Hi @laa these are the parameters we use. Everything else is default


 Map defaultsMap=new HashMap<String, Object>();    
    defaultsMap.put("storage.keepOpen", false);     // Tells to the engine to not close the storage when a database is closed. Storages will be closed when the process will shutdown

    defaultsMap.put("tx.useLog", true);             // Transactions use log file to store temporary data to being rolled back in case of crash    
    defaultsMap.put("tx.log.synch", true);          // Executes a synch against the file-system for each log entry. This slows down transactions but guarantee transaction reliability on non-reliable drives
    defaultsMap.put("tx.commit.synch", true);       // Synchronizes the storage after transaction commit (see Disable the disk synch)

    defaultsMap.put("cache.level1.enabled", false); 
    defaultsMap.put("cache.level1.size", 0);
    // ES removed Feb 2015 seit ODB 2.0.0 nicht mehr nötig defaultsMap.put("cache.level2.enabled", false);
    // ES removed Feb 2015 seit ODB 2.0.0 nicht mehr nötig defaultsMap.put("cache.level2.size", 0);

    defaultsMap.put("nonTX.recordUpdate.synch", true);               // Executes a synch against the file-system at every record operation. This slows down records updates but guarantee reliability on unreliable drives
    defaultsMap.put("index.auto.rebuildAfterNotSoftClose", true);    // Auto rebuild all automatic indexes after upon database open when wasn't closed properly
    defaultsMap.put("mvrbtree.lazyUpdates", 1);                      // -1=Auto, 0=always lazy until explicit lazySave() is called by application, 1=No lazy, commit at each change. >1=Commit at every X changes

    OGlobalConfiguration.setConfiguration(defaultsMap); 
andrii0lomakin commented 7 years ago

@EricSchreiner I suppose it means that you have 4GB disk cache. Which is above of capabilities of 32 JVM. Also, I strongly do not recommend to disable first level cache

About your settings:

defaultsMap.put("cache.level1.enabled", false); 
defaultsMap.put("cache.level1.size", 0)

it may cause a lot of strange exceptions in your application.

 defaultsMap.put("mvrbtree.lazyUpdates", 1); 

mvrbtree is removed long time ago from distribution and this parameter is not needed.

defaultsMap.put("nonTX.recordUpdate.synch", true);
defaultsMap.put("tx.commit.synch", true);

is the legacy of 1.x version of the implementation of txs and not used any more. So you can remove them too.

defaultsMap.put("tx.useLog", true);

is always true and can not be changed, even if you directly set it to false. So this parameter can be removed too.

defaultsMap.put("storage.keepOpen", false);     // Tells to the engine to not close the storage when a database is closed. Storages will be closed when the process will shutdown

is not valid anymore, this parameter is always true and can not be changed. The same for defaultsMap.put("tx.log.synch", true) . It is always true and can not be changed so you can remove it from a map.

Back to your main issue. I suggest you set com.orientechnologies.orient.core.config.OGlobalConfiguration#DISK_CACHE_SIZE to 800 (it means 800 MB) keep -XX:MaxDirectMemorySize=1G and I do suggest you set DISK_CACHE_SIZE parameter directly not through OGlobalConfiguration.setConfiguration(defaultsMap) call.

P.S. BTW what is your expected DB size, according to tests which you already performed do you already have some expectations. By DB size, I mean size on disk in GBs or MBs?

EricSchreiner commented 7 years ago

Hi @laa thanks for your reply. Does your answer mean that I should remove all settings with the exception of defaultsMap.put("index.auto.rebuildAfterNotSoftClose", true);?

You wrote I should reduce DISK_CACHE_SIZE to 800m. Is this related to XX:MaxDirectMemorySize? If yes how? Can I set DISK_CACHE_SIZE and XX:MaxDirectMemorySize to 128mb?

For your understanding: We have thousands of users runnig PicApport not having the XX:MaxDirectMemorySize set. Lots of them are using a RaspberyPI with just one Gig of physical memory. So in the past we recommend to set -Xmx512m for 32 BIt Installation and RaspberryPI what works fine with serveral thousand photos(we tested with 6000). What i like to achieve is that this will still work with our new version with Orient 2.2.xx because I expect a lot of our users will not read our release notes. (We also have created a .exe file with a Windows-Installer for complete unexperienced users who I cannot ask to set any parameter) And again I do not care about speed in these low memory situations it shoud just work.

To answer your questions. My test database contains about 50.000 Photos (metadata and thumbnails) . The total size of the database directory is 880mb dbconfig.txt

I also have a test system with one million Photos (for this we have a 64 Bit engine but I have not tried it yet with V2.2.xx)

andrii0lomakin commented 7 years ago

@EricSchreiner I see I suppose I can help you to run a database without -xx:maxdirectmemory set. But now we are busy. I will be back to this issue on next Tuesday.

EricSchreiner commented 7 years ago

OK

taburet commented 7 years ago

Hi @EricSchreiner,

There are basically only three options that affect/limit the memory usage of OrientDB:

  1. -Xmx limits the heap size, as we all know. Usually, if not provided, it's auto-configured by JVM to some reasonable default. May be configured only from the JVM args, OrientDB can't control it.

  2. -XX:MaxDirectMemorySize limits the amount of the off-heap "direct" memory JVM may allocate. Usually, if not provided, it's auto-configured by JVM to the value of -Xmx. May be configured only from the JVM args, OrientDB can't control it.

  3. -Dstorage.diskCache.bufferSize, aka OGlobalConfiguration#DISK_CACHE_SIZE, limits the disk cache size of OrientDB. Auto-configured by OrientDB to the value of Xmx if XX:MaxDirectMemorySize is not provided, otherwise it's configured to max(machine_memory_size - Xmx - 2GB, 256MB) and upper-limited to the value of XX:MaxDirectMemorySize. Minimum supported value is 64MB. Note, that is not a hard limit, if the disk cache is full and non of its memory can be freed, the so called small overflow buffers will be allocated. Setting the disk cache size to extremely low values while performing huge queries will not help, especially in case of update/insert queries.

The disk cache allocates memory from JVM's off-heap "direct" memory. So to avoid OOMs DISK_CACHE_SIZE <= XX:MaxDirectMemorySize inequality must always hold and Xmx + DISK_CACHE_SIZE + memory_reserved_by_os_and_other_processes <= machine_memory_size must also hold.

Regarding your test box with less than 2GB of RAM mentioned in the emails. Try to set Xmx to 512MB and remove all other options. -XX:MaxDirectMemorySize will be auto-configured to 512MB by JVM, DISK_CACHE_SIZE will be auto-configured to 512MB by OrientDB. Total memory consumption of OrientDB must be around 1GB, that should leave enough RAM to the OS and other processes. But still it's better to have -XX:MaxDirectMemorySize and DISK_CACHE_SIZE set to explicit values according the the aforementioned inequalities.

In case of 1GB RaspberyPI box with already configured Xmx of 512MB and neither of -XX:MaxDirectMemorySize or DISK_CACHE_SIZE set, this means that OrientDB may eat up to 1GB of RAM. That is too much for the box. I may tune the DISK_CACHE_SIZE auto-configuration procedure to adjust for low memory conditions, but there still will be a problem if Xmx set so high that there is no RAM left the disk cache. What is the typical Xmx of your RaspberyPI users?

EricSchreiner commented 7 years ago

Hi @taburet, thanks for your answer. I'll check and come back to you. In between: Is it possible that _max(machine_memorysize - Xmx - 2GB, 256MB) does not work if we execute OrientDB in a 32-Bit VM on a Computer that has more than 16gig of RAM? What would _machine_memorysize be in a 32-Bit Environment with a PC with 16 Gig of Ram? Are you using: _machine_memorysize = os.getTotalPhysicalMemorySize();

taburet commented 7 years ago

Is it possible that max(machine_memory_size - Xmx - 2GB, 256MB) does not work if we execute OrientDB in a 32-Bit VM on a Computer that has more than 16gig of RAM?

It should work, AFAIU, but in a wrong way :) Why it may behave differently specifically at 16GB?

What would machine_memory_size be in a 32-Bit Environment with a PC with 16 Gig of Ram?

Seems like it will be 16GB and that may be a problem. Will check this.

Are you using: machine_memory_size = os.getTotalPhysicalMemorySize();

Yes, exactly.

taburet commented 7 years ago

@EricSchreiner did you see messages like "32 bit JVM is detected. Lowering disk cache size from X to Y" in the logs?

EricSchreiner commented 7 years ago

Hi @taburet no we don't see messages like 32 Bit JVM detected. I've attached a logfile that contains a configuration dump PicApport-32Bit.txt

EricSchreiner commented 7 years ago

@taburet one more thing about why using 32-Bit on a machine with 16 Gig of Ram. Well this is our Test-environment. Also my laptop I use for testing has 32 gig of RAM and I also need to test installations I have received from users .

taburet commented 7 years ago

@EricSchreiner yes, I understand your needs. The strange thing is that according to the provided log file there is no auto-configuration done on OrientDB side at all, but it must be done, sine disk cache is not configured. I will investigate more on this.

andrii0lomakin commented 7 years ago

Hi @EricSchreiner could you try this build https://drive.google.com/file/d/0B2oZq2xVp841eklKTmVLMW1kMTQ/view?usp=sharing

andrii0lomakin commented 7 years ago

Hi @EricSchreiner please do not set MaxDirectMemory but only heap size, so we will test whether your requirements are satisfied.

EricSchreiner commented 7 years ago

Hi @laa, Hi @taburet

Still not working. I've removed the MaxDirectMemory Parameter. Please see logfile below

PicApport-odb-2-2-26.txt

andrii0lomakin commented 7 years ago

@EricSchreiner According to stack trace which you sent

at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) EXCEP@ at com.orientechnologies.common.directmemory.OByteBufferPool.allocateBuffer(OByteBufferPool.java:335)

the exception happens at line 335 of byte buffer pool, But in the file after the changes which I did this line corresponds to code https://github.com/orientechnologies/orientdb/blob/fed5276ae99462665abe7b2ffed00cedd904a58b/core/src/main/java/com/orientechnologies/common/directmemory/OByteBufferPool.java#L335

  if (clear) {

which obviously can not cause OOM in byte buffer. Which means that you used out of dated version.

How did you get 2.2.26 distribution? Did you download it from the link which I posted to you?

EricSchreiner commented 7 years ago

Hi @laa I used the link you provided: https://drive.google.com/file/d/0B2oZq2xVp841eklKTmVLMW1kMTQ/view?usp=sharing as you can see in the logfile it should be the correct one....

DEBUG@ 11:11:23.231 PicApportDBService.setDbConfig: ----- start dump db-configuration ----- OrientDB 2.2.26-SNAPSHOT (build e48ae34ce1827858f78f9f4ddfe30fd289050478) configuration dump

andrii0lomakin commented 7 years ago

HI @EricSchreiner that is my fault then, could you try this build https://drive.google.com/file/d/0B2oZq2xVp841THlxbVhhemxMMGM/view?usp=sharing . Could also set log level to the info level or lower so we will see all information printed in the log by ODB.

EricSchreiner commented 7 years ago

Hi @laa still not working (see attached logfile) The build number is different from the previous one... rientDB 2.2.26-SNAPSHOT (build 1083c79e63810dbafc9fee07f24654b22a5b7e65) I've also set -Dlog.console.level=INFO but it seems not to work???? PicApport-odb2.2.26.txt

andrii0lomakin commented 7 years ago

@EricSchreiner could you add the following parameter to the command line -Djava.util.logging.config.file=<path to file> this file should be similar to the following https://github.com/orientechnologies/orientdb/blob/2.2.x/server/config/orientdb-server-log.properties and send me the log output?

EricSchreiner commented 7 years ago

Hi @laa Find attached the logfile created: orient-server.log.txt

andrii0lomakin commented 7 years ago

@EricSchreiner could you try new build https://drive.google.com/file/d/0B2oZq2xVp841THlxbVhhemxMMGM/view?usp=sharing please send me a log output as you did during the previous run.

EricSchreiner commented 7 years ago

HI @laa see attached Logfile orient-server.log.txt

The Excepions ther should be aleady fixed for 2.2.25 ?!? please see https://github.com/orientechnologies/orientdb/issues/7585#issuecomment-318350805

andrii0lomakin commented 7 years ago

@EricSchreiner I used the latest version of source code when provided this build. Which means that exception is not fixed I suppose. But issue was created about different exception and my code does not touch this part, probably once OOM issue was fixed exception started to be reproduced. Could you modify your test to run execution operations without Lucene index ? Or just drop it for a while ? It will make queries slower but it will allow us to check absence of OOM.

EricSchreiner commented 7 years ago

HI @laa sorry but getting rid of Lucene is almost impossible. Anyway I'm almost 100% sure that the issue with Lucene was fixed. (I've tested it myself)

andrii0lomakin commented 7 years ago

@EricSchreiner as I wrote I made a distribution from latest source code. @luigidellaquila @robfrank could you look my commits in code and confirm that they do not affect Lucene functionality and it also means that we, unfortunately, have to reopen the issue.

andrii0lomakin commented 7 years ago

As of the moment, I see that issue is blocked by Lucene exception, unfortunately, we can not make progress on this issue till it will not be fixed. Once we resolve this problem we may continue.

orientdb-builder commented 7 years ago

About lucene exceprtion, can you try with latest 2.2.26-sNAPSHOT?

https://oss.sonatype.org/content/repositories/snapshots/com/orientechnologies/orientdb-community/2.2.26-SNAPSHOT/orientdb-community-2.2.26-20170809.164115-12.tar.gz

andrii0lomakin commented 7 years ago

@orientdb-builder I included fix which I provided a few minutes ago in the main branch. Could we re-run build and get the latest snapshot from source code as for now?

EricSchreiner commented 7 years ago

@laa Now the memory error is back again..... orient-server.log.txt

andrii0lomakin commented 7 years ago

@EricSchreiner in the comment above I asked @orientdb-builder to run the build again to include latest changes from source code. Once new snapshot will be available me or him will provide a link for you. Build which you tried that is build which I provided before my latest build. I suppose today or tomorrow build from latest source code will be provided.

andrii0lomakin commented 7 years ago

@EricSchreiner latest snapshot is generated https://oss.sonatype.org/content/repositories/snapshots/com/orientechnologies/orientdb-community/2.2.26-SNAPSHOT/orientdb-community-2.2.26-20170810.153209-13.tar.gz could you try it and send the log as usual :-)

EricSchreiner commented 7 years ago

HI @laa

see logfile.... orient-server.log.3.txt

andrii0lomakin commented 7 years ago

Hi @EricSchreiner @robfrank as I can see Lucene exception was reproduced the same as in issue #7585 which @EricSchreiner is referenced. I will mark this issue as blocked till issue #7585 will be resolved. Actually I suppose that OOM is fixed and it allows to reproduce Lucene issue but we need to be 100% sure.

EricSchreiner commented 7 years ago

Hi @laa , @robfrank is it possible that I need another orientdb-spatial-2.2.23-dist.jar? If yes where will I get the orientdb-spatial-2.2.26-dist.jar?

andrii0lomakin commented 7 years ago

@EricSchreiner that is very likely could you try this one https://oss.sonatype.org/content/repositories/snapshots/com/orientechnologies/orientdb-spatial/2.2.26-SNAPSHOT/orientdb-spatial-2.2.26-20170810.160653-15.jar ?

robfrank commented 7 years ago

@EricSchreiner the problem referenced in #7585 is solved from 2.2.25. I supposed you updated to latest 2.2.25. So please take the latest snapshot of spatial as well.

EricSchreiner commented 7 years ago

@laa now the log with orientdb-spatial-2.2.26-20170810.160653-15.jar orient-server.log.2.txt

andrii0lomakin commented 7 years ago

So Lucene issue still persist . Will wait for fix .

EricSchreiner commented 7 years ago

Hi @laa, hi @robfrank any news on this?

EricSchreiner commented 7 years ago

Hi @laa, hi @robfrank, I've tested 2.2.26 GA. looks much better :-) I'll continue testing tomorrow Logfile: orient-server.log.txt Config: PicApport-2.2.26.txt

andrii0lomakin commented 7 years ago

@EricSchreiner ok so probably it was just an issue with a mix of libraries of different versions, I am waiting for your final conclusion.

andrii0lomakin commented 7 years ago

Hi @EricSchreiner any update on this?

EricSchreiner commented 7 years ago

Hi @laa seems to be OK so far. I've attached two logfiles from the same database started with 32Bit and 64Bit. The only thing I see is, that sometimes it takes a very long time to shutdown the database. This seem to be new. orient-server-64bit.log.0.txt orient-server-32bit.log.0.txt

andrii0lomakin commented 7 years ago

@EricSchreiner cool. What do you mean by takes too long time to shut down? Does it take on both instances or on 32 bit only?

andrii0lomakin commented 7 years ago

@EricSchreiner I will close this issue because seems like it is fixed. But please open a new issue if you think something is wrong with the shutdown, may be it is a bug may be not let see. If you will be able to create profiler snapshot it will be cool if not we will provide instructions for very good and free one, but of course without handy GUI.

andrii0lomakin commented 7 years ago

@santo-it for release notes: "On 32 bit systems because the high level of memory fragmentation ODB can not allocate memory by big chunks, so it always allocates memory with page-size granularity. It will decrease performance but will avoid throwing of OOM in case of allocation of direct memory".

EricSchreiner commented 7 years ago

@laa thank you for your support......