microsoft / topologic

A python library for intelligently building networks and network embeddings, and for analyzing connected data.
https://topologic.readthedocs.io
MIT License
27 stars 8 forks source link

"Fixing" topologic documentation #26

Closed daxpryce closed 4 years ago

daxpryce commented 4 years ago

This is a work in progress and I'm honestly not convinced it's the right way to handle this in the first place.

topologic is kind of taking an interesting position as a 'library that includes the best python libraries for building, manipulating, and understanding graphs - along with some tlc and novel embedding approaches'. In some cases, there's just no need for us to reinvent the wheel - for instance, the spatial distance functions in scipy are fine, and there's no use in us writing our own. A python louvain implementation already exists and it's reasonably fast, so why build our own?

However, we have opinions on which functions and implementations are the best, and want to make it easy for those doing graph analysis to use these best implementations without having to do their own research. So we have basically two options:

In the former case, we need to update their documentation, so that their examples refer to the topologic namespace, or fix their documentation so that Sphinx won't throw a warning (which is a validation failure to our merge/release process).

The problem with this is that we're achieving it through string replacement on the doc strings, and if their documentation changes enough our replacements are going to make no sense (or introduce errors). This won't harm our published library or documentation, but it will be inconsistent at runtime for any downstream users who have done a fresh pip install of our library and maybe got a newer version of scipy or python-louvain that introduces new docstrings, arguments, etc, that break our patched doc string replacement routines and thus break help().

In the latter case, we're adding a bunch of code that ... doesn't really do anything on its own, besides define a method signature with clear documentation that we maintain and then call their library.

And perhaps the biggest problem both of these share is that we're implying we're doing something important when the reality is we're just curating the best functions for the common but vital tasks we use in our daily work with network analysis.

In conclusion, I am not convinced this is the right commit, but it's far enough along I want others to take a look.

daxpryce commented 4 years ago

Yeah. I don't feel comfortable doing these aliasing methods, nor monkeypatching the docstrings on runtime like this. I'm going to address these 3 functions differently in a different branch/PR