Better support for table level locking

simovesterinen81 commented 3 years ago

Hi,

We have been using a long time now table level locking branch. In our application we have multiple databases on one postgres instance and they all use IMCS heavily.

Every now and then some strange behaviour happens on imcs side system crashes on out of memory killer or segmentation fault. These crashes seem always somehow related to multiple users using same time get,load or delete actions to different tables. Would there be possibility to implement better support for table level locking that would prevent these crashes?

knizhnik commented 3 years ago

Sorry, but to prevent crashes it is necessary first to understand the reason of crashes. Are them causes by OOM or some bug in software?

Table level locking prevent concurrent modification of the table, But it doesn't prevent (and should not prevent) concurrent execution of read-only queries. So if problem is caused by memory exhaustion which in turn is result of concurrent execution of some heave queries, then locking can not solve this problem.

Can you some how provide me core files or stack traces of such crashed backends?

simovesterinen81 commented 3 years ago

Both reasons:

OOM sometimes happen when there are multiple users and simultnious delete&load&query. The Get-query gets somehow infinite loop --> takes long time--> builds memory --> oom killer. The query result should be very small is small but we can't even kill the process from postgres side.

Segmentation fault happens sometimes. It is very difficult to reproduce but it seems that deleting data and querying processes starting same time some times makes this. Also when this occurs the Get-query gets in not responding situation and crashes the server.

These are the reasons why i suspect that the problem is somehow related to deleting/adding rows from imcs when there is same time querying.

knizhnik commented 3 years ago

Sorry, but without core file I can't do anything. Can you configure poastgres to dump cores?

simovesterinen81 commented 3 years ago

No. It's a production environment. We are not able to reproduce this in development env.

Maybe I could create the needed locking by using advisory locks.

simovesterinen81 commented 3 years ago

Could there be better exception hanlind in IMCS side so that if OOM killer happens it would not restart the postgres server? It's a problem for us because we need to load all the data back to imcs and during that time our users can't use the IMCS.

knizhnik commented 3 years ago

It is possible to dump cores ni production environment as well. Certainly debugging code built with optimization is much more difficult, then code built without optimization and with debugging symbols. But at least it is possible to see stack trace and examine with dissambler the code fragment where crash happen.

Sorry, but I have to say once again, that it is not possible to do something without understanding first the reason of the problem. It is not clear to me now whether OOM happen because of IMCS shared memory or private backend's memory used for query execution (for example for hash).

In general, it is not possible to protect program from OO killer. You can play with overcommit memory settings or add swap, In first case, application will get malloc failure before OOM killer. In the second case you will get swapping instead of failure (but impact of trashing on performance may be even more dramatic).

As far as I know there is no way to protect process from OOM killer. And once processes is killed, there is on other way for Postgres postmaster to handle this crash except complete instance restart, because shared memory can stay in inconsistent state after such crash.

knizhnik / imcs

Better support for table level locking #67