Open UnitedMarsupials-zz opened 6 years ago
You can use secret
for partitioning nodes so clients with a secret can only use nodes with matching secret. For example, start nodes in QA with --secret=qa
and QA clients with secret=qa
. Then QA nodes can't be used by other clients.
Alternately, you can also sub-class NodeAllocate
to customize node allocation by overriding allocate
method. This can be used to allocate nodes only desired nodes (e.g., returning 0 for any node not wanted).
Yes, this would work for us -- unless someone brings up a node with misconfigured "secret". But I would've thought, disabling the client promiscuity -- just don't listen to announcements -- would be a trivial flag to add...
If it is trivial, submit a patch.
In our environment, the jobs rely heavily on many different aspects of the local configuration, including versions of the (non-Python) software, which Dispy can not -- and is not asked to -- manage.
Unfortunately, because Dev, QA, and Production machines are sometimes on the same network and can hear each other's broadcasts, we've had "crosspollination" -- a server from Dev, for example, being "discovered" and automatically added to a QA-cluster.
At best, this causes unwelcome errors:
2018-11-06 14:48:46 dispy - Ignoring pulse message from 10.78.16.162 2018-11-06 14:58:47 dispy - Ignoring pulse message from 10.78.16.162 2018-11-06 15:08:47 dispy - Ignoring pulse message from 10.78.16.162 2018-11-06 15:18:47 dispy - Ignoring pulse message from 10.78.16.162 2018-11-06 15:28:47 dispy - Ignoring pulse message from 10.78.16.162 2018-11-06 15:38:47 dispy - Ignoring pulse message from 10.78.16.162 2018-11-06 15:48:47 dispy - Ignoring pulse message from 10.78.16.162 2018-11-06 15:48:47 dispy - Discovered 10.78.16.162:51348 (r00cb6n0c) with 16 cpus 2018-11-06 15:48:49 dispy - Running job 51310424 on 10.78.16.162 2018-11-06 15:48:49 dispy - Running job 100000160 / 51310424 on 10.78.16.162 (busy: 1 / 1) 2018-11-06 15:49:01 dispy - Received reply for job 100000160 / 51310424 from 10.78.16.162 2018-11-06 15:49:01 dispy - Job 100000160 on 10.78.16.162: Traceback (most recent call last): 2018-11-06 15:49:01 dispy - Closing node 10.78.16.162 for processTask / 1541533183926 2018-11-06 15:49:01 dispy - Running job 51311960 on 10.78.16.162 2018-11-06 15:49:01 dispy - Failed to run 51311960 on 10.78.16.162: bytearray(b'NAK (invalid computation 1541533183926)') 2018-11-06 15:49:01 dispy - Failed to run job 51311960 on 10.78.16.162 for computation processTask
At worst, it could introduce subtle inaccuracies, because of the differences in configuration.
Though this is a great feature in general, it should be possible to disable it -- relying on the list of nodes explicitly given to the
JobCluster
method only.