reanahub / reana-client

REANA command-line client
http://reana-client.readthedocs.io/
MIT License
10 stars 46 forks source link

"reana-client ls" shows intermittent DB issues #358

Closed lukasheinrich closed 4 years ago

lukasheinrich commented 4 years ago

sometimes I get this

(reana) lheinric@lxplus718:~/riri/reana-demo-atlas-recast% reana-client ls  
Something went wrong while retrieving file list for workflow workflow.13:
(psycopg2.OperationalError) server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

[SQL: SELECT user_.created AS user__created, user_.updated AS user__updated, user_.id_ AS user__id_, user_.access_token AS user__access_token, user_.email AS user__email, user_.full_name AS user__full_name, user_.username AS user__username 
FROM user_ 
WHERE user_.id_ = %(id_1)s 
 LIMIT %(param_1)s]
[parameters: {'id_1': UUID('fdc3788a-3020-4ef5-bad1-8c45821e9130'), 'param_1': 1}]
(Background on this error at: http://sqlalche.me/e/e3q8)
(reana) lheinric@lxplus718:~/riri/reana-demo-atlas-recast% reana-client ls
NAME                                         SIZE    LAST-MODIFIED      
_yadage/yadage_snapshot_workflow.json        16733   2020-01-17T14:52:27
eventselection/submitDir/submitted           0       2020-01-17T14:51:52
eventselection/submitDir/hist-sample.root    10340   2020-01-17T14:51:52
eventselection/submitDir/location            126     2020-01-17T14:51:46
eventselection/submitDir/driver.root         1378    2020-01-17T14:51:46
eventselection/submitDir/input/sample.root   2124    2020-01-17T14:51:46
eventselection/submitDir/hist/sample.root    2066    2020-01-17T14:51:46

this is two reana-cliens ls calls shortly one after the other.. .. seems like there is some connectivity issues

tiborsimko commented 4 years ago

Tested ~10 concurrent BSM workflows, saw peaks of ~100 concurrent K8s pods and ~30 concurrent DB connections. All went OK.

Going to split DEV and QA DB instances, but it seems we are OK connection-number-wise.

Hence probable direction is to improve of catching of DB exceptions and retrying.

tiborsimko commented 4 years ago

DEV and QA instances split.

tiborsimko commented 4 years ago

Now that we are after workshop, we can debug the underlying issue in priority mode.