numpy / numpy.org

The NumPy home page
http://numpy.org/
BSD 3-Clause "New" or "Revised" License
113 stars 107 forks source link

ecosystem: graphs and networks? #320

Closed mdeff closed 3 years ago

mdeff commented 4 years ago

Linear algebra, hence numpy, is a fundamental tool to analyze graphs (with spectral graph theory) and data on graphs (with abstract harmonic analysis). I'm co-maintaining the PyGSP (github), a package that implements these concepts in numpy (and scipy for sparse operations). (In machine learning, these tools are used for the trending graph neural networks.)

The main package for network analysis is NetworkX. Taking a network science POV, the package is not "array-like" so I don't know if it should be under a hypothetical "graphs & networks" group too.

As I'm obviously biased, I defer to your judgment whether this merits inclusion or not. I can also wait on #313 or prepare a PR. Last issue/PR for today I promise. ;)

rgommers commented 4 years ago

Networks/graphs seems like a good topic to add. I think we have space for two more topics, to have a 2x8 grid. This could be one of them.

NetworkX is an obvious candidate. PyGSP does look like a sensible addition as well. I'm not very familiar with this field as a user, so other recommendations are very welcome.

mdeff commented 4 years ago

Great! If non-array tools can be considered for inclusion, I'll also suggest graph-tool. It has the same NetSci POV as NetworkX, but with a C++ backend built on boost (NetworkX is pure python).

My biased take about PyGSP:

mdeff commented 4 years ago

A classification alternative (in the direction of #321, but mostly for thoughts) would be a big "data/signal analysis/processing" (better name to be found) topic to regroup signal and image processing (data on Euclidean grids), computer graphics (data on 2D surfaces), numerical methods like FEMs (data on 3D volumes), GIS (data on the sphere). Those are all concerned with the analysis (SP/Fourier/filters/wavelets/frames, statistics, ML, etc.) of structured (a line, plane, sphere, 2D surface, 3D volume, etc.) data (vertex positions or color, physical quantities, color intensity, demographics, etc.).

I however fear that's too much merging (e.g., CG and numerical methods are certainly more than that). It also puts the spotlight on structured data, not on the structure itself. (While CG and numerical methods are concerned about mesh quality and simplification, that's far behind the interest of the NetSci community about networks. There's of course not much to care about the Euclidean structure in classical SP.)

rossbar commented 4 years ago

I'd also like to add my +1 for adding a graph/network analysis tab.

Taking a network science POV, the package is not "array-like" so I don't know if it should be under a hypothetical "graphs & networks" group too.

Note that NetworkX supports linear algebra-based approaches to graph problems, even if numpy/scipy are only soft dependencies for NetworkX.

jarrodmillman commented 4 years ago

We are also working to improve our np support in nx. But we will also keep using dictionaries as well, since it is better for many non-linear algebraic graph algorithms.

I also think that it would make sense to move "Interactive Computing" to its own tab. Interactive computing doesn't seem like it belongs under the "Scientific Domain" tab. It also seems deserving of its own section perhaps including Spyder along with the other useful interactive environments.

That move would also enable you to add "Network Analysis", while retaining the 2x8 grid.

mdeff commented 4 years ago

NetworkX supports linear algebra-based approaches, but it's not the main focus. As @jarrodmillman wrote, dictionaries are better for the larger focus of NetworkX. Anyway that's not an issue as @rgommers agreed to include it if we get a "Graphs and Networks" group.

Agree that "Interactive Computing" would fit better under its own tab rather than as a group under "Scientific Domains". Maybe open a separate issue for that?

jarrodmillman commented 4 years ago

FYI, NetworkX recently made NumPy, SciPy, Matplotlib, and pandas default requirements (they were optional before): https://github.com/networkx/networkx/commit/5f2445c1b5ff4db2dd0f943e006df1a107e8f00b

Here is more information from the PR:

""" Given their central position in the scientific Python ecosystem and the fact that they provide self-contained, easy to install wheels, NumPy, SciPy, Matplotlib, and Pandas are now default dependencies. They don't have pre-built wheels for pypy or 3.9-dev, so they aren't installed by default on those instances via the environment marker:

platform_python_implementation!='PyPy' and python_version<'3.9'

Before we officially add support for Python 3.9, we will change python_version<'3.9' to python_version<'3.10'. If and when the core projects (ie., NumPy, SciPy, Matplotlib, and Pandas) provide self-contained wheels for PyPy, we will remove platform_python_implementation!='PyPy'. """

rgommers commented 3 years ago

Sorry for the huge delay. I added a policy/procedure on updating the Ecosystem tab in gh-313. tl;dr we're good to add Graphs and Networks as a category.