Open noopurintel opened 1 week ago
Hi @noopurintel This is a good catch. Your intuition seems right that if disable_client_auth
is set to true
, the aggregator should not be attempting to certify the collaborator.
It looks like the server's attempt to certify the collaborator is conditioned only on if tls
is set to true
(see)
We should probably add an additional condition for disable_client_auth
to skip trying to retrieve x509_common_name
(and other certification related verification from the collaborator).
Added documentation label. Agreed that it would be good to have detailed docs about types of connections and authentication settings and how they affect communication between the participants
Describe the bug Using single machine and bare metal approach of Task Runner API, federation run is failing if the
disable_client_auth
flag is set to true in plan.yaml file (withtls
unchanged/set as true).System throws
KeyError: 'x509_common_name'
from aggregator_server.py (screenshot attached below)To Reproduce Steps to reproduce the behavior:
fx workspace create --template torch_cnn_mnist --prefix my_workspace
and move inside the folder.disable_client_auth
fromfalse
totrue
keeping all other values intact.fx plan initialize
fx workspace certify
fx aggregator generate-cert-request
fx aggregator certify -s
fx collaborator create -n collaborator1 -d 1
fx collaborator generate-cert-request -n collaborator1
fx collaborator create -n collaborator2 -d 2
fx collaborator generate-cert-request -n collaborator2
fx collaborator certify --request-pkg col_collaborator1_to_agg_cert_request.zip -s
fx collaborator certify --request-pkg col_collaborator2_to_agg_cert_request.zip -s
fx collaborator certify --import col_collaborator1_to_agg_cert_request.zip
fx collaborator start -n collaborator1
fx collaborator certify --import col_collaborator2_to_agg_cert_request.zip
fx collaborator start -n collaborator2
fx aggregator start
Expected behavior Federation should run smoothly.
Also, we believe that the step where aggregator certifies collaborators should be skipped as the MTLS is off (i.e. client auth is disabled). Is this the right expectation?
It would be helpful if the exact steps to be followed in this scenario are well documented.
Screenshots
Aggregator screen
Collaborator screen
Machine:
disable_client_auth
set tofalse
)