orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.75k stars 871 forks source link

Auto close of storage when no used anymore #3055

Closed lvca closed 8 years ago

lvca commented 10 years ago

Now the storage remains open, but with remote connection. The ideal would be closing non used storage after a while. I propose my idea:

This also resolve the flush of WAL when the storage is not more used.

andrii0lomakin commented 9 years ago

I will provide it as optional feature because of for 90% we will add CPU overhead which does not bring value. Also gains of this approach are not completely clear for me, but there are pitfalls degradation of storage mulit core scalability.

lvca commented 9 years ago

@laa We already discussed the use case. The problem is that there is no Full checkpoint on WAL yet, so dirty pages could stay in RAM for a lot of time (if they are used in reads). This is a way to reduce the chance we lost data if the JVM is killed. 90% of the users prefer to have database not corrupted than getting a slow down because of this (when and if happens).

andrii0lomakin commented 9 years ago

We discussed this, and I wrote you that this claims are not true.

  1. We have full checkpoint for a long time.
  2. Dirty pages always flushed in background.
  3. Dirty pages do not participate on read because of item 2.

I described all this on Wiki. https://github.com/orientechnologies/orientdb/wiki/plocal-storage-disk-cache#write-cache "On periodical background flush ... " and so on.

andrii0lomakin commented 9 years ago

@lvca How autoclose storage is related to durabilty ? We gurantee durabilty now without auto close.

andrii0lomakin commented 9 years ago

what does it mean "use a counter on OStorage impl that is incremented at every usage, method addUser()" from your description as I can see it and current lifecycle storage will never be closed. Why we implement this issue if fuzzy checkpoint fully resolves problem of rebuild on crash ?

lvca commented 9 years ago

This issue is because Wal fuzzy checkpoint was too hard to implement in 2.0. Is it still the same?

andrii0lomakin commented 9 years ago

I do not think that we should add new features , just before release. Coud you also answer on my previos question ?

andrii0lomakin commented 9 years ago

About fuzzy checkpoint , nothing changed since previous estimations.

tglman commented 9 years ago

not sure if the close can compromise the durability, and actually we should try to guarantee the proper durability also if the close is missed.

anyway not do the close bring me some question: do we free the memory in the caches and in the OStorage instance without close ? do we close the files used by the storage is we don't do a close ? do we terminate remote connections if we don't do a close ?

if the answer is no this can bring a few important issues for some use case, for example if i have a server with inside a few thousand db used in an alternative way, can happen:

I easily run out of memory because the memory used in the storage is never freed until shoudown( i call this memory leak)

if i've 1000 db in the server with an average of 35 classes that's means at least two file per class i'll reach a point where i run out of file descriptor (in a common linux machine are 65000, 35_2_1000 = 70000) and in that situation the db is completely stalled and not accessible(a socket is a file descriptor), the only way to recover it is shoutdown the server(by killing it because it will not accept connections).

same story for a client that access to an hight number of servers.

the point is also that not doing the close at all is just a matter of time to get the server completely blocked.

you may argue that is not a common use case, but i feel like is more common than what we belive. also if you play a bit with the numbers you can get more common cases (100 classes 200 index = ~ 600 files you can add some 100 for the wal. and after you need just 100 dbs ...)

lvca commented 9 years ago

We have another problem here relative to the DiskCache: it's not one per Orient instance, but one per DB. This means they can grow up in terms of RAM used and each DiskCache has own threads. So with 100 open databases we've a big waste of RAM and CPU

andrii0lomakin commented 9 years ago
  1. Did you read my message that this proposal would not work, if I understand it correctly ?
  2. Why instead of fixing issue https://github.com/orientechnologies/orientdb/issues/1939 we discuss it's workaround ?

@tglman I will not argue because I proposed issue https://github.com/orientechnologies/orientdb/issues/1939 myself )). I will add to this issue feature to close files if there are no pages in cache for them for a while.

andrii0lomakin commented 9 years ago

@tglman Actually I would prefer to create separate issue about file close because making of cache jvm wide has obvious solution, matter of files identification and internal api refactoring. But for file close we need to add lock free tricks to do not harm system scalability.

andrii0lomakin commented 9 years ago

@tglman Seems it is very simple to do I link this issue https://github.com/orientechnologies/orientdb/issues/3111 to https://github.com/orientechnologies/orientdb/issues/1939

andrii0lomakin commented 9 years ago

Put it to 2.1 to merge with other issues listed here.

andrii0lomakin commented 9 years ago

Will be fixed in issue #3111

andrii0lomakin commented 9 years ago

I think would be better to create LRU list of storages with defined limit and auto close the least used storage.

P.S. Eventually would be cool to migrate to LIRS alghorithm

lvca commented 9 years ago

:+1

rohitdev commented 9 years ago

If we are not going to shutdown storgae then in multi tenant scenarios where a vendor chooses ODB for repository it might become a big issue!

andrii0lomakin commented 9 years ago

Sorry, could you elaborate why it will be big issue ?

On Wed, Sep 2, 2015 at 3:06 PM rohitdev notifications@github.com wrote:

If we are not going to shutdown storgae then in multi tenant scenarios where a vendor chooses ODB for repository it might become a big issue!

— Reply to this email directly or view it on GitHub https://github.com/orientechnologies/orientdb/issues/3055#issuecomment-137049329 .

rohitdev commented 9 years ago

In an isolated database approach for multi tenant applications , when embedded DB is used the storage will never be shut down unless the JVM is shutdown or the server reboots. I'm facing this issue as of now as i've written a system service with multiple embedded databases and over a period of time the profiler shows memory with not many operations an lot of file handles open. That led my application to hit the ulimit on Linux throwing too many files open error for the process

andrii0lomakin commented 9 years ago

I do not understand. Could you provide use case and list problems which may be raised during this use case.

On Wed, Sep 2, 2015 at 9:20 PM rohitdev notifications@github.com wrote:

In an isolated database approach for multi tenant , when embedded DB is used the storage will never be shut down unless the JVM is shutdown or the server reboots.

— Reply to this email directly or view it on GitHub https://github.com/orientechnologies/orientdb/issues/3055#issuecomment-137197778 .

rohitdev commented 9 years ago

Due to the nature of product i work on i would not be able to give you exact deatils but will put in some general scenario.

A software collects raw data from different entities using various protocols. This raw data is then processed and kept in embedded DB for presentation layer per entity (The data needs to be loaded on demand only hence the choice of having a instance per entity as well as for security reasons) Now the software runs as a system service and in production environment where the server would not reboot for months. So in a month we would be collecting several instances of database from a single entity. Now there would be thousands of entities in production which eventually means that since the database doesn't close storage/files on shut down we would have thousands of file handles open thus hampering the OS operations.

Another of my friend who is running a SaaS product wanted to use ODB as a backend for his application. His model for SaaS is an isolated database per customer. He was looking around to replace MySQL with some NoSQL solution. The amount of data is not much but he has higher need of security and to reduce network latency. For this reason he choose an embedded DB. Now the application server runs for months again and the same issue of file handles remain open for the lifetime of the app server. So he is now hesitant to use ODB as a replacement.

andrii0lomakin commented 8 years ago

problem is gone after refactoring of file class