prominence-eosc / imc

PROMINENCE infrastructure provisioner
Apache License 2.0
0 stars 0 forks source link

Scale tests #6

Closed alahiff closed 5 years ago

alahiff commented 5 years ago

Carry out some basic tests running many jobs across multiple clouds (this hasn't been done yet since the switch to REST API)

alahiff commented 5 years ago

Tried a workflow involving a job with 40 instances, which all failed quickly (accidently). All infrastructure was deployed then deleted successfully. Repeated with 60 quickly failing instances, and this was also all good.

Also ran another workflow with 60 longer running jobs (~1 hour), no probems.

alahiff commented 5 years ago

Repeat tests once https://github.com/prominence-eosc/imc/issues/19 has been implemented

alahiff commented 5 years ago

First tests with PostgreSQL backend, size of pool is 8:

alahiff commented 5 years ago

No DB errors for 10 jobs, but got this for one job:

Exception deploying infrastructure: "string indices must be integers, not str"

No DB errors for 20 jobs, but one job seemed to be stuck in running (17705), which is unlikely to be related to IMC.

For 40 jobs, first 30 deployed successfully (router has idle limit of 30). condor_startds appeared to all join successfully, but jobs stayed in the ready state for a long time. Some startds started automatically dying. Issue tracked here: https://github.com/prominence-eosc/prominence/issues/26. No DB issues.

alahiff commented 5 years ago

Increased pool size to 16 (from 8):

alahiff commented 5 years ago

For 160 jobs, got lots of these:

CRITICAL [imc] Deployment error, this is a bug: expected a string or other character buffer object

Also one of these:

CRITICAL [imc] Exception deploying infrastructure: "global name 'token' is not defined"

This error is probably from the # Final check if we should delete the infrastructure section where utilities.create_im_auth has token as an argument but is not defined.

Also encountered this: https://github.com/prominence-eosc/imc/issues/21

Also, there were some leftover infrastructures in IM. Note that I deleted around 50-70 in various states.

alahiff commented 5 years ago

The token issue is fixed in https://github.com/prominence-eosc/imc/commit/3a1a8e80b3791ec79a81cc8d598ae51d81d43f52

Otber improvements to deletion handling also included in this commit.

alahiff commented 5 years ago

Also trying:

alahiff commented 5 years ago

The deployment exception seems to be fixed with https://github.com/prominence-eosc/imc/commit/a7ef686b913bc60e1f26f6f0f5f4c71ac9530504

alahiff commented 5 years ago

320 jobs with router max idle 60, pool size 24 had no problems.