WIP: Adding config to nx-parallel

Schefflera-Arboricola commented 5 months ago

This PR adds the ability to modify the default joblib's parallel configs by integrating nx-parallel with networkx's config manager.

relevant config PRs in networkx: https://github.com/networkx/networkx/pull/7363 , https://github.com/networkx/networkx/pull/7225 ref. Config.md file in this PR for usage. ref. GSoC'24 Blog 6 for more context.

Thank you :)

Schefflera-Arboricola commented 4 months ago

For proper integration, I will need to change a few lines of code in all the algorithms. So far, I've modified square_clustering to work well with NetworkX's config, which included renaming cpu_count to get_n_jobs. However, I've reverted get_n_jobs back to cpu_count because the tests were failing, as other algorithms still use cpu_count. I'll proceed with updating the other algorithms once I receive the first review. I think changing the name across all algorithms right now will over-crowd this PR and will make it challenging to review. LMK if that seems good to you @dschult .

Thank you :)

dschult commented 3 months ago

After reading the blog about adding config I am more in favor of avoiding the use of the networkx config system for controlling the nx-parallel use of joblib. Said another way, I think it would be better to keep the user in the joblib config system and support no joblib related parameters in nx.config.

Duplicating the joblib parameters doesn't make sense to me. And maintaining two (or creating a third) config system seems to ask for confusion and duplication. Can we just rely on users using the joblib config system?

Schefflera-Arboricola commented 3 months ago

After reading the blog about adding config I am more in favor of avoiding the use of the networkx config system for controlling the nx-parallel use of joblib. Said another way, I think it would be better to keep the user in the joblib config system and support no joblib related parameters in nx.config.

Duplicating the joblib parameters doesn't make sense to me. And maintaining two (or creating a third) config system seems to ask for confusion and duplication. Can we just rely on users using the joblib config system?

nx-parallel is a networkx backend that uses joblib(right now), not a joblib backend for graphs algorithms. So, it makes sense that we recommend our users to use the networkx's config context manager over the joblib's. Because, if a networkx user wants to use nx-parallel with some other backend(s)(like, with nx-cugraphs -- like backend_priority=[cugraph, parallel]) then they would have to set the nx-parallel's configs in the nx.config because this user would not want to run their code with a joblib's context manager when running with nx-cugraphs' implementation(because it will probably fail). And doing what you are suggesting will deprive nx-parallel users of all those features and functionalities that lets(or will let) them use multiple networkx backends together and this would probably exclude nx-parallel from the networkx backend ecosystem.

And we only have one config system i.e. NetworkX's config, which can also be used as a context manager. And joblib's context manager will not work as expected with the config system implemented in this PR(as outlined in the blog you mentioned above).

The only maintenance needed will be to update the ParallelConfig class, if joblib ever changes the args of the parallel_config class, which I think is very unlikely to happen very frequently. Also if it does happen, joblib has an appropriately long deprecation period.

Also, the main concern of mine(also mentioned in the blog) with the current implementation in this PR, is that the fetching of the global configs in every function call is redundant but it is not very expensive. It's like converting ParallelGraph instance into a nx.Graph instance in every nx-parallel algorithm.

Thank you for your comments. I hope I addressed them all.

dschult commented 3 months ago

nx-parallel is a networkx backend that uses joblib(right now), not a joblib backend for graphs algorithms. So, it makes sense that we recommend our users to use the networkx's config context manager over the joblib's. Because, if a networkx user wants to use nx-parallel with some other backend(s)(like, with nx-cugraphs -- like backend_priority=[cugraph, parallel]) then they would have to set the nx-parallel's configs in the nx.config because this user would not want to run their code with a joblib's context manager when running with nx-cugraphs' implementation(because it will probably fail). And doing what you are suggesting will deprive nx-parallel users of all those features and functionalities that lets(or will let) them use multiple networkx backends together and this would probably exclude nx-parallel from the networkx backend ecosystem.

Can you help me understand this better? Are we expecting every networkx backend to support running in an environment with every other networkx backend's configs? Said another way, why do we need to worry about users having two nx backends in their priority order when making nx-parallel look for joblib config values? It seems like the nx-parallel backend will look for joblib config settings and the nx-cugraph backend will look for cugraph config settings. How would these break each other?

Schefflera-Arboricola commented 3 months ago

Can you help me understand this better? Are we expecting every networkx backend to support running in an environment with every other networkx backend's configs? Said another way, why do we need to worry about users having two nx backends in their priority order when making nx-parallel look for joblib config values? It seems like the nx-parallel backend will look for joblib config settings and the nx-cugraph backend will look for cugraph config settings. How would these break each other?

It looks like I should expand more on what I wrote in blog about how NetworkX's config works and how nx-parallel integrates with the NetworkX configuration system and why it's essential to have its configs in nx.config instead of relying solely on joblib's context manager.

The unified configuration management system within NetworkX (nx.config) ensures consistency and ease of use when dealing with different backends. Each backend, such as nx-parallel and nx-cugraph, will have its configurations managed within this unified system.

If each backend looked for its configurations separately (e.g., nx-parallel using joblib's context manager and nx-cugraph using its own system), it could lead to conflicts and difficulties in managing multiple backends. By having all backend configurations within nx.config, we ensure that they can coexist and be prioritized without breaking each other. Also this is easier for the user also to specify all the configs for all the backends at one place.

Addressing your questions

Expectations of Backend Configurations:

We do not expect every NetworkX backend to support every other backend’s configurations. Instead, we expect them to adhere to the unified configuration system provided by nx.config. By not making nx-parallel look for joblib's context manager, we maintain a consistent configuration environment within NetworkX. And I think compatibility with networkx is more important than joblib because as quoted above :

nx-parallel is a networkx backend that uses joblib(right now), not a joblib backend for graphs algorithms.
Importance of nx.config for nx-parallel:

Joblib configurations are part of nx-parallel configurations. To use nx-parallel effectively with other backends, these configurations need to be included within nx.config. Example:
```
NetworkXConfig(backend_priority=[cugraph, parallel], backends=Config(cugraph=<configs_for_cugraphs>, parallel=ParallelConfig(backend='loky', n_jobs=None, verbose=0, temp_folder=None, max_nbytes='1M', mmap_mode='r', prefer=None, require=None, inner_max_num_threads=None, backend_params={})), cache_converted_graphs=True)
```
This example shows how nx.config can manage multiple backends, including nx-parallel, ensuring that all configurations are in one place and can be managed consistently.

So, to summarise, keeping all backend configs within nx.config ensures consistency and ease of management, prevents conflicts that could arise from using multiple backends with separate configuration systems. By having nx-parallel configurations within nx.config, we ensure that it can work seamlessly with other NetworkX backends. And that's why we use nx.config over joblib's context manager. Also joblib's context manager cannot fix global configs like nx.config.backends.parallel.n_jobs = 8, it can only set configs in a context.

If you need further clarification or have additional questions, please let me know.

dschult commented 3 months ago

Thanks for that description of the advantages of using nx.config.backends over relying on other configuration systems. I still have questions/concerns and I'll try to describe them here.

if each backend relied on separate configuration systems, how might that lead to conflicts? Similarly, how would that lead to difficulties when using multiple backends? You would set up the config for joblib however you like. Then set up the config for cugraphs however you like, then set up the networkx backend_priority in whatever order you prefer. I don't see how conflicts could happen. And I don't see how it makes it simpler to put all the config options for all the backends in a single line of code. That seems more complicated to me -- not simpler.
When we make all backends use the networkx config system, (as far as I can tell) the only consistency that is increased is consistency in the function names used to set/change parameters. People still have to learn the separate config systems to know what config options there are and what they do. So, they learn joblib and those parameters, then they learn cugraph and those parameters, and then they have to learn a third system for setting all those values. Wouldn't it be easier if they just learn the config system for joblib and then cugraph and then the only config option that matters for nx.config.backends is the backend_priority. (and with the nx config number of parameters being small and easily explored it should take less time to learn the nx.config system than if it contains all the parameters for all the backends).
In what way does the nx-parallel config system work seemlessly with the other backends when it is housed in nx-parallel? Isn't it independent of the other backend parameter configs? Wouldn't setting joblib parameters for nx-parallel work seemlessly with nx-cugraph? I guess this comes down to: what are the conflicts we are trying to avoid?
If I understand correctly, joblib's context manager is only one way to set the config for joblib. Any of the existing methods for setting joblib parameters should work if we let users manage the joblib config parameters themselves. But maybe I'm missing something here. Also, setting n_jobs=8 in nx.config is not a global config setting either right? It doesn't impact joblib processes run outside of nx-parallel. Do you mean "global within nx-parallel"?

Thanks for this discussion as I think I'm learning about config systems generally as part of it. :)

Schefflera-Arboricola commented 3 months ago

if each backend relied ..... complicated to me -- not simpler.

Sure, backends can have a different configs system if they want. But, in case of nx-parallel, if we only use joblib's config system, there are certain issues, like as mentioned above, a user won't be able to define global configs as joblib only offers context manager parallel_config to specify/update the configs, because for joblib these are not config options but parameters. So, then if the user only uses joblib’s context manager and puts all their code in a joblib.parallel_config context then while running the nx-cugraph implementation the joblib’s context manager will not be ignored and joblib will be used, which is not what a user would expect.

And one does not need to “put all the config options for all the backends in a single line of code”. These are the default configs:

NetworkXConfig(
    backend_priority=[cugraph, parallel],
    backends=Config(
        cugraph=<configs_for_cugraphs>,
        parallel=ParallelConfig(
            backend='loky',
            n_jobs=None,
            verbose=0,
            temp_folder=None,
            max_nbytes='1M',
            mmap_mode='r',
            prefer=None,
            require=None,
            inner_max_num_threads=None,
            backend_params={}
        )
    ),
    cache_converted_graphs=True
)

And one can modify the ones they want like this:

nxp_configs = nx.config.backends.parallel
nxp_configs.backend = “threading”
nxp_configs.n_jobs = 7

nxc_configs = nx.config.backends.cugraph
…

I hope this looks less complicated to you.

When we make all .... all the backends).

As mentioned in the Config.md, one can use the nxp_configs(from above) just like they use joblib.parallel_config context manger, so a user does not need to learn a third config system. Also, someone using a networkx backend is expected to understand networkx's config, I think, and then they would learn config systems specific to the backend(which right now should be very easy for the nx-parallel backend). And I prefer it this way, because nxp_configs can be used as a global variable as well as a context.

In what way does... we are trying to avoid?

I’m not sure if I understand this one properly... but most of the code for managing different backends and their configs is in the main networkx repo not in nx-parallel. And the conflict is that to set the joblib’s configs we need to use the joblib’s context manager which won’t be ignored in nx-cugraphs implementation and also we cannot set global configs in joblib.

If I ....in nx-parallel"?

Yes, afaik joblib's context manager is the only way to set the config for joblib.Parallel. No, right now a user cannot use joblib’s context to set joblib’s parameters in nx-parallel. They can but it won’t work as expected. nx-parallel configs and joblib’s configs are independent but same. Also, in joblib n_jobs is a parameter not a config, so when I say ”globally“ or “global config” I mean it in nx-parallel not joblib.

Thank you :)

Schefflera-Arboricola commented 3 months ago

Made default n_jobs=-1 instead of None because I think this will be beneficial for the new users, as it would allow them to run their algorithms on all CPU cores(with the default backend="loky") without needing to understand or configure the parallel options in detail. LMK what you think. Thanks!

Schefflera-Arboricola commented 3 months ago

closing this PR. PR https://github.com/networkx/nx-parallel/pull/75 documents configuring using joblib and integrates networkx's config in nx-parallel. This PR attempted to create a unified config system but it's not perfect. Refer Issue https://github.com/networkx/nx-parallel/issues/76 for more. Thanks!

networkx / nx-parallel

WIP: Adding config to nx-parallel #68

Addressing your questions