securefederatedai / openfl

An Open Framework for Federated Learning.
https://openfl.readthedocs.io/en/latest/index.html
Apache License 2.0
728 stars 204 forks source link

Federation run fails when client auth is disabled #1123

Open noopurintel opened 1 week ago

noopurintel commented 1 week ago

Describe the bug Using single machine and bare metal approach of Task Runner API, federation run is failing if the disable_client_auth flag is set to true in plan.yaml file (with tls unchanged/set as true).

System throws KeyError: 'x509_common_name' from aggregator_server.py (screenshot attached below)

To Reproduce Steps to reproduce the behavior:

  1. Git clone openfl repository, create and activate virtual environment.
  2. Install all the pre-requisites.
  3. Model owner - Create workspace fx workspace create --template torch_cnn_mnist --prefix my_workspace and move inside the folder.
  4. Model owner - Modify plan.yaml file changing disable_client_auth from false to true keeping all other values intact.
  5. Model owner - Initialise the plan fx plan initialize
  6. Model owner - Certify the workspace fx workspace certify
  7. Aggregator - Generate certs fx aggregator generate-cert-request
  8. Aggregator - Certify the certs fx aggregator certify -s
  9. Collaborator1 - Create collaborator1 fx collaborator create -n collaborator1 -d 1
  10. Collaborator1 - Create CSR for collaborator1 fx collaborator generate-cert-request -n collaborator1
  11. Collaborator2 - Create collaborator2 fx collaborator create -n collaborator2 -d 2
  12. Collaborator2 - Create CSR for collaborator2 fx collaborator generate-cert-request -n collaborator2
  13. Aggregator - Sign col1 cert request fx collaborator certify --request-pkg col_collaborator1_to_agg_cert_request.zip -s
  14. Aggregator - Sign col2 cert request fx collaborator certify --request-pkg col_collaborator2_to_agg_cert_request.zip -s
  15. Collaborator1 - Import signed package for col1 fx collaborator certify --import col_collaborator1_to_agg_cert_request.zip
  16. Collaborator1 - Start collaborator1 fx collaborator start -n collaborator1
  17. Collaborator2 - Import signed package for col2 fx collaborator certify --import col_collaborator2_to_agg_cert_request.zip
  18. Collaborator2 - Start collaborator2 fx collaborator start -n collaborator2
  19. Aggregator - Start aggregator fx aggregator start

Expected behavior Federation should run smoothly.

Also, we believe that the step where aggregator certifies collaborators should be skipped as the MTLS is off (i.e. client auth is disabled). Is this the right expectation?

It would be helpful if the exact steps to be followed in this scenario are well documented.

Screenshots

Aggregator screen

aggregator_screen

Collaborator screen

collaborator_screen

Machine:

kta-intel commented 1 week ago

Hi @noopurintel This is a good catch. Your intuition seems right that if disable_client_auth is set to true, the aggregator should not be attempting to certify the collaborator.

It looks like the server's attempt to certify the collaborator is conditioned only on if tls is set to true (see)

We should probably add an additional condition for disable_client_auth to skip trying to retrieve x509_common_name (and other certification related verification from the collaborator).

kta-intel commented 1 week ago

Added documentation label. Agreed that it would be good to have detailed docs about types of connections and authentication settings and how they affect communication between the participants