networkx / nx-parallel

A networkx backend that uses joblib to run graph algorithms in parallel.
BSD 3-Clause "New" or "Revised" License
34 stars 21 forks source link

Synchronizing NetworkX and Joblib configurations in nx-parallel #76

Open Schefflera-Arboricola opened 3 months ago

Schefflera-Arboricola commented 3 months ago

Background:

PR https://github.com/networkx/nx-parallel/pull/75 adds implementation to make nx-parallel compatible with the networkx's configuration system, offered for networkx backends, and documents how to configure nx-parallel using joblib’s parallel_config.

PR https://github.com/networkx/nx-parallel/pull/68 was an attempt to create a unified configuration system in nx-parallel by wrapping the joblib.Parallel()(inside nx-parallel algorithms) within a with joblib.parallel_config(configs) context, here configs are extracted from nx.config.backends.parallel. This approach made NetworkX’s config closely mirror joblib’s, giving the appearance of synchronization between the two systems. However, in the last meeting, Dan mentioned that this approach complicates things.

Requirement:

We need a robust layer in nx-parallel which updates NetworkX’s config as soon as any of joblib’s config parameters are updated, and vice versa. The goal is to ensure that when a user updates a configuration in either of the systems, the other system is instantaneously updated by this layer/interface in nx-parallel to maintain consistency.

Challenges:

  1. Order of Updates:

    • As an nx-parallel developer, how can I determine which config system (NetworkX or joblib) the user has updated and in what order?
    • For example, consider the following:

      joblib.parallel_config(n_jobs=5)
      nx.config.backends.parallel.n_jobs = 8
      
      nx.config.backends.parallel.n_jobs = 8
      joblib.parallel_config(n_jobs=5)
      • It’s simple to extract config stored in networkx and in joblib’s parallel_config. So, in both cases, we will extract n_jobs = 8 from NetworkX and n_jobs = 5 from joblib. However, the expected behavior is different: in the first case, n_jobs should be 8 at the end, and in the second case, it should be 5. How can I detect the order of updates to apply the correct configuration?
  2. Keeping Context Managers in Sync:

    • Is there a way in Python to determine the context manager we are currently in and the order in which multiple context managers are applied?
    • Consider the following:

      nx.config.backends.parallel.n_jobs = 6 
      joblib.parallel_config(verbose=50) 
      G = nx.complete_graph(5) 
      nx.square_clustering(G, backend="parallel")  # n_jobs = 6, verbose = 50 
      
      with nx.config.backends.parallel(n_jobs=8): 
       nx.square_clustering(G, backend="parallel")  # n_jobs = 8, verbose = 50 
       with joblib.parallel_config(verbose=10): 
           nx.square_clustering(G, backend="parallel")  # n_jobs = 8, verbose = 10
    • How can we ensure that the configurations within nested context managers are correctly synchronized and applied in the expected order?
  3. Updating Configurations Without Modifying External Libraries?:

    • Is it even possible to update both NetworkX’s and joblib’s configurations to keep them in sync without requiring any code changes in either of the libraries?

Thank you :)