zodb / relstorage

A backend for ZODB that stores pickles in a relational database.
Other
53 stars 46 forks source link

Avoiding UnableToLockRowsToReadCurrentError exceptions #496

Closed tflorac closed 1 year ago

tflorac commented 1 year ago

Hi, I'm actually building a Pyramid application using RelStorage with a PostgreSQL 11 backend. This application is deployed in production with Apache using _modwsgi in multi-processes and multi-threads configuration (actually with 4 processes and 4 threads per process). The application is quite classic: it is a REST API backend for documents management, using many blobs and a BTrees-based Hypatia catalog to index metadatas. I don't have any issue while using the application in "normal" mode (which are mostly read-only requests), but we have created a client application which is using the API in batch mode using several threads to speedup loading of a huge number of documents, and under heavy load we start getting many PostgreSQL exceptions like UnableToLockRowsToReadCurrentError, with this kind of error:

UnableToLockRowsToReadCurrentError: Acquiring a lock during commit failed: ERREUR: lock_not_available ASTUCE : readCurrent CONTEXTE : fonction PL/pgsql lock_objects_and_detect_conflicts(bigint[],bigint[]), ligne 78 à RAISE Transaction failed; no lock info available Stored Objects <TPCTemporaryStorage at 0x7f2a3deb59b0 count=103 bytes=52972> ============================================================= | OID | Length | Previous TID | ============================================================= 12035 | 33 |283462907479747618 12039 | 33 |283462907479747618 12043 | 33 |283462907479747618 12047 | 33 |283462907479747618 12055 | 33 |283462907479747618 12059 | 33 |283462907479747618 12063 | 33 |283462907479747618 12067 | 33 |283462907479747618 12071 | 33 |283462907479747618 12075 | 33 |283462907479747618 12079 | 33 |283462907479747618 12083 | 33 |283462907479747618 12087 | 33 |283462907479747618 12103 | 33 |283462907479747618 23105 | 535 |283462907265075268 75451 | 292 |283462740776619588 348904 | 4062 |283462887336300390 420920 | 1140 |283462658851401250 2200456 | 1389 |283462791514430532 2585961 | 51 |283462658851401250 3461784 | 32 |283462907479747618 4630477 | 839 |283462887020750080 4735162 | 1406 |283462890257268206 4735172 | 467 |283462890257268206 5228561 | 570 |283462904747133798 5492422 | 651 |283462902384925627 5585135 | 862 |283462894668508791 5585186 | 450 |283462904747133798 5617235 | 405 |283462904747133798 5617236 | 405 |283462904747133798 5646556 | 1902 |283462904747133798 5646557 | 405 |283462904747133798 5646558 | 405 |283462904747133798 5646559 | 405 |283462904747133798 5646560 | 430 |283462904747133798 5646577 | 405 |283462904747133798 5646578 | 380 |283462904747133798 5646579 | 303 |283462904747133798 5646580 | 405 |283462904747133798 5646581 | 1117 |283462904747133798 5646582 | 380 |283462904747133798 5646583 | 1143 |283462904747133798 5646584 | 1419 |283462904747133798 5646585 | 430 |283462904747133798 5646586 | 1419 |283462904747133798 5646587 | 430 |283462904747133798 5646588 | 430 |283462904747133798 5646589 | 430 |283462904747133798 5646595 | 691 |283462904747133798 5646598 | 405 |283462904747133798 5646599 | 405 |283462904747133798 5646600 | 405 |283462904747133798 5646601 | 405 |283462904747133798 5646602 | 405 |283462904747133798 5646604 | 1484 |283462904747133798 5646605 | 380 |283462904747133798 5646606 | 1003 |283462904747133798 5646608 | 4958 |283462904747133798 5646609 | 380 |283462904747133798 5646610 | 885 |283462904747133798 5646612 | 885 |283462904747133798 5646614 | 380 |283462904747133798 5646615 | 1143 |283462904747133798 5646616 | 380 |283462904747133798 5646620 | 761 |283462904747133798 5646623 | 885 |283462904747133798 5675257 | 1747 |283462907479747618 5675265 | 1007 |283462907479747618 5677924 | 985 | 0 5677925 | 453 | 0 5677926 | 381 | 0 5677927 | 25 | 0 5677928 | 270 | 0 5677929 | 519 | 0 5677930 | 367 | 0 5677931 | 291 | 0 5677932 | 394 | 0 5677933 | 205 | 0 5677934 | 141 | 0 5677935 | 276 | 0 5677936 | 305 | 0 5677969 | 541 | 0 5677970 | 417 | 0 5677971 | 157 | 0 5677972 | 205 | 0 5677973 | 32 | 0 5677974 | 83 | 0 5677975 | 87 | 0 5677976 | 64 | 0 5677977 | 109 | 0 5677978 | 157 | 0 5677979 | 205 | 0 5677980 | 51 | 0 5677981 | 51 | 0 5677982 | 62 | 0 5677983 | 55 | 0 5677984 | 248 | 0 5677985 | 205 | 0 5677986 | 51 | 0 5677987 | 41 | 0 5677988 | 41 | 0 5677989 | 41 | 0 5677990 | 207 | 0 readCurrent {oid: tid} {12102: 283462906433788928, 1809112: 283462901393942664, 12034: 283462903017768089, 12328: 283462903017768089, 12054: 283462903017768089, 12046: 283462903017768089, 15384: 283462906433788928, 12045: 283462904031238912, 12042: 283462903017768089, 12038: 283462903017768089, 12037: 283462905621198796, 12076: 283462903017768089, 12056: 283462906433788928, 12104: 283462906329864977, 4449472: 283462907123285521, 12100: 283462904994138453, 12033: 283462907123285521, 12086: 283462906529955549, 12070: 283462903017768089, 1273889: 283462903711210018, 12082: 283462906529955549, 1816029: 283462906732787814, 3461785: 283462901937834496, 12068: 283462903017768089, 1274003: 283462903711210018, 12590: 283462906433788928, 12060: 283462906433788928, 12080: 283462903017768089, 15366: 283462906433788928, 12096: 283462904994138453, 12072: 283462903017768089, 1274002: 283462904874447445, 12062: 283462903017768089, 12327: 283406267451644313, 12066: 283462903017768089, 12092: 283462904994138453, 12324: 283462906433788928, 28: 283462906529955549, 12052: 283462904994138453, 12101: 283462308284574498, 12048: 283462903017768089, 5360520: 283462907123285521, 100: 283462857210119236, 12078: 283462903017768089, 1807525: 283462894668508791, 12044: 283462903017768089, 12074: 283462903017768089, 21845: 283462906433788928, 12040: 283462903017768089, 12036: 283462899309228817, 12058: 283462903017768089, 27: 283462722692519628, 12064: 283462903017768089}

I also receive some VoteReadConflictError and UnableToLockRowsDeadlockError exceptions.

Is this a "normal" behaviour? Are there any RelStorage tuning parameters that I should set to limit these errors?

Best regards, Thierry

jamadden commented 1 year ago

Those are all variations on (subclasses of) ZODB.POSException.ConflictError, itself a type of transaction.interfaces.TransientError. They indicate that ZODB's optimistic concurrency assumptions do not hold true: multiple transactions are trying to modify the same object, or objects that they depend upon. Thus, it is normal for applications that are writing concurrently to shared objects.

The only thing you can do about that (aside from either limiting your concurrency or designing your application not to write to shared objects) is to implement transaction retries. The higher the concurrency and the more objects are shared, the more transaction retries it will take. The transaction package has some limited support for that; other options include pyramid_tm, repoze.tm2, and nti.transactions. Last I looked, pyramid_tm and repoze.tm2 are specific to Pyramid and WSGI, respectively, while the mechanisms in transaction and nti.transactions can be used outside of those environments.

tflorac commented 1 year ago

Hi Jason, I actually use the pyramid_tm package, and a default value of 3 retries in case of conflict error. As the shared conflicting objects are probably part of the catalog, I just reduced client applications concurrency level... That's not a perfect solution but in my use case it's not a great problem, as documents integrity is the main point and it is a long but only one-shot process! Best regards, Thierry