Closed Schefflera-Arboricola closed 9 months ago
perhaps we should raise an exception for invalid n_jobs. My thinking is that If they specified an input that doesn't make sense they probably meant something else.
I initially thought of raising an exception, but in joblib when you pass n_jobs
> os.cpu_count()
then there are no warnings or errors(Idk exactly how joblib handles this case.) so I thought that maybe that is the norm.
- Having n_jobs=-2 means 1 CPU is left untouched. That off-by-one value might cause confusion. Maybe -1 should mean all-but-one. That would match indexing where a[:-1] returns all but the last element.
- The case n_jobs==0 is currently not valid. What do you think about making that mean use-all-cpus? With this and the previous comment the docstring would say "if n_jobs <= 0, then n_cpus + n_jobs cpus are used."
Here also I was trying to follow the n_jobs from joblib. In sklearn also the default is -1 and it also means using all the CPUs. But, I know it makes more sense to make 0 or all the cores as default. In this issue(https://github.com/joblib/joblib/issues/1504) they have mentioned they cannot make such a change because it's a big project and it'll impact a lot of people. But, we can make 0 as default if we want. :)
you removed some examples from the doc_strings. I'm fine with that, but would like to write down the criteria for what part of the NX function doc_string we will keep for the nx-parallel doc_string. What criteria did you use? (this will be flexible for a while until we figure out what is best, but probably good to start discussing it).
I didn't remove a lot of things(only examples related networkx and not nx-parallel and see also section, i'll add nx-parallel specific egs soon), as i wanted to discuss it with you and @MridulS . I don't think we should remove all of the Parameters
section, but we can make it shorter(by removing extra info about the parameter like weight in all_pairs_bellman_ford_path), and also remove like the formula explanation things(like in Betweenness centrality) because that will never change, i think.
Please let me know your thoughts on this.
Thanks!
I made a separate PR for adding n_jobs
, because if we want n_jobs
to work like how it does in joblib then we will have to add more joblib.Parallel's kwargs, because what n_jobs
means depends on the backend we are using in joblib("loky", "multiprocessing", "threading" etc.)
Also, I have updated the docs and removed a lot of them. Please let me know if you think I have removed too much somewhere.
Thanks!
Also, this failed workflow test seems unrelated to the changes I've made.
I had a question and a suggestion related to the documentation issue:
Question : Is it even possible to automatically transfer docstring of networkx functions to nx-parallel, like we write the docs related to parallel implementation and nxp examples, and all the other docs are automatically imported from the same function in the main networkx repo? Are there any tools that allow this kind of docstring transfer?
Suggestion : Another way to approach that I could come up with to solve the documentation issue was to first display all the main things(like the function header, a description of the parallel implementation, one-line summary of what function does/returns and nxp examples) and then below that have a drop-down for all the other things(like parameters, notes, references etc.). This would probably be more user-friendly but there will be redundancy on the developer side. Please let me know your thoughts on this.
I understand your comment about making n_job changes in a separate PR. We will have to think about it more.
I don't think you have removed too much -- but we do need to figure out what we will use the doc_strings for before we know how much to remove or keep.
Python allows the assignment and reassignment of doc_strings on functions within a single python session. The doc_string is held as a string in the __doc__
attribute of the function. But that doesn't make it easy to do. It would be quite a task just to create a system that went through the NX functions, found which are also in nx-parallel, and then tried to match the two.
We should figure out who we are aiming for with the doc_strings. Are we wanting them in the code where people reading the code will see? Are we wanting them at the python interpretter level so people in Ipython can type func?
to see the doc_string? Or are we wanting this to be a website of documentation for nx-parallel that people look up with their browser?
I think an early step is to figure out how to add a box with a few lines to the NX documentation. Like cugraph does for this:https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html
In the recent commit, I've made it so that the docstring of all functions is extracted, and then from that the Parallel Computation
, Additional Parameters
and the Examples
sections are extracted and then those are stored in a dictionary(with function name as key) and then at the end this dictionary is returned by the get_funcs_info
function and I've also made it so that the url to the function header is there on the networkx page(as Learn more
, see PR #7219 for more)
I've made this PR independent from PR#7219 and updated the first comment indicating all the changes made.
Thanks!
Alright, let's try this out with nx main docs.
Hmm, this doesn't seem to work, I can't see nx-parallel show up in the docs.
I'll debug this later this week.
Hmm, this doesn't seem to work, I can't see nx-parallel show up in the docs.
the problem might be from this line for param in sorted(extra_parameters):
, (an error like - TypeError: 'NoneType' object is not callable
), bcoz both graphblas and nx-cugraphs don't include the extra_parameters
and extra_docstring
keys in a function's dictionary if they don't have any content. But in nx-parallel the default value for all keys of a function is None
, and while building the backend docs only the existence of the key is checked and not if its value is None
or not.
So, to solve this we can check if the given values for keys(extra_parameters and extra_docstring) are None
(like here in the renaming PR, I am also checking for all the keys that if they have None
values.) or we can modify the in the get_info function to remove all the key-value pairs where the value is None
.
running this script shows what get_info returns :
import nx_parallel as nxp
l = nxp.get_info()
for i in l:
if i != "functions":
print(i)
print(l[i])
print()
else:
print(i)
for j in l[i]:
print(j)
print(l[i][j])
print()
output :
backend_name
parallel
project
nx-parallel
package
nx_parallel
url
https://github.com/networkx/nx-parallel
short_summary
Parallel backend for NetworkX algorithms
functions
betweenness_centrality
{'extra_docstring': 'The parallel computation is implemented by dividing the\nnodes into chunks and computing betweenness centrality for each chunk concurrently.', 'extra_parameters': None}
local_efficiency
{'extra_docstring': 'The parallel computation is implemented by dividing the\nnodes into chunks and then computing and adding global efficiencies of all node\nin all chunks, in parallel, and then adding all these sums and dividing by the\ntotal number of nodes at the end.', 'extra_parameters': None}
number_of_isolates
{'extra_docstring': 'The parallel computation is implemented by dividing the list\nof isolated nodes into chunks and then finding the length of each chunk in parallel\nand then adding all the lengths at the end.', 'extra_parameters': None}
is_reachable
{'extra_docstring': 'The function parallelizes the calculation of two\nneighborhoods of vertices in `G` and checks closure conditions for each\nneighborhood subset in parallel.', 'extra_parameters': None}
tournament_is_strongly_connected
{'extra_docstring': 'The parallel computation is implemented by dividing the\nnodes into chunks and then checking whether each node is reachable from each\nother node in parallel.', 'extra_parameters': None}
closeness_vitality
{'extra_docstring': 'The parallel computation is implemented only when the node\nis not specified. The closeness vitality for each node is computed concurrently.', 'extra_parameters': None}
all_pairs_bellman_ford_path
{'extra_docstring': 'The parallel computation is implemented by computing the\nshortest paths for each node concurrently.', 'extra_parameters': None}
...... So, to solve this we can check if the given values for keys(extra_parameters and extra_docstring) are None (like here in the renaming PR, I am also checking for all the keys that if they have None values.) or we can modify the in the get_info function to remove all the key-value pairs where the value is None.
I don't think this is the issue bcoz after re-running the tests under https://github.com/networkx/networkx/pull/7219 , the nx-parallel note was still not appearing in the documentation artifact.
Hmm, this doesn't seem to work, I can't see nx-parallel show up in the docs.
https://github.com/networkx/nx-parallel/pull/55 resolves this
[UPDATED]
__init__.py
files and tried to make the code consistent in a few placesget_info()
function inutils.backend
(to show nx-parallel docs on the main networkx docs webpage)[Addedget_info()
inutils.backend
instead ofnx_parallel/__init__.py
because to extract the docstrings of all the algorithms we need to import all algos in nx_parallel and doing that in__init__.py
gives circular import error]python -m pytest --doctest-modules --pyargs nx_parallel
from wf tests, bcoz error due to no examples in docstring to testAssumption(syntax) : The docstring of functions should be as follows