irtazahashmi / pytspl

A Python library for Topological Signal Processing and Learning.
MIT License
4 stars 0 forks source link

from list to incidence matrices #1

Closed cookbook-ms closed 6 months ago

cookbook-ms commented 7 months ago

So most of the time, as you know, the network is stored as lists of node pairs and/or node triples... We often do not have incidence matrices directly. So there shall be a procedure building (ideally sparse) incidence matrices out of the edge list.

For example, given lists [(1,2),(0,2),(1,0)]

there should be a script obtaining an incidence matrix B1 and B2.

I previously suggested the julia lib, together with the toponetx. I think it's better to have a look there. Remember we do not have B1 and B2 as dataset, which are preprocessed.

irtazahashmi commented 7 months ago

I think this issue has been resolved. Should we close it?

cookbook-ms commented 7 months ago

I think I might have confused you. So what we need is basically: given a network list, for example, the london dataset edges https://github.com/cookbook-ms/sc-graph-library/blob/main/data/london_street/LondonEdges.csv, then we need to create B_1 and B_2. If I'm not mistaken, you can achieve this by using the SimplicialComplex from toponetx? I think their functions there require all the simplices listed already, for example, as you used in your playground, you need to list all the edges and triangles.

What we should have is very similar, but we only have access to the edge list, then we obtain also the triangle list based on user-defined options, e.g.,

  1. if we have (a,b), (b,c) and (a,c), then there is a triangle $(a,b,c)$ (this is commonly used, but it will lead to exponential growths of the number of triangles, and such)
  2. if we have (a,b), (b,c) and (a,c), and $d(a,b) <\epsilon$, $d(a,c) <\epsilon$, $d(b,c) <\epsilon$ (if one can define some distance from the locations of $a,b,c$), then we have a triangle $(a,b,c)$
  3. ... This simplicial complex construction varies per type of data, per type of requirement. For the current dataset we have, these two should be more often used. (Note that when we have other dataset, like pointclouds, Rips and Alpha complexes are more often used https://gudhi.inria.fr/python/latest/, although very much similar notions).

Note that we do not have $B_1$ and $B_2$. The datafiles I gave are preprocessed by me.

We instead have this kind, for example, the Barcelona dataset https://github.com/000Justin000/ssl_edge/blob/master/data/TransportationNetworks/Barcelona/Barcelona_net.tntp

This should be the function we want to have https://github.com/000Justin000/ssl_edge/blob/master/modules/NetworkOP.jl

irtazahashmi commented 7 months ago

The data can be read now using a network list (e.g. LondonEdges.csv). B1 and B2 can now be extracted as well as the simplices. The triangles are extracted using the methods above depending on user-defined options