pathpy / pathpyG

GPU-accelerated Next-Generation Network Analytics and Graph Learning for Time Series Data on Complex Networks.
https://www.pathpy.net
GNU Affero General Public License v3.0
33 stars 3 forks source link

IndexError in model selection #234

Closed VinsRR closed 1 week ago

VinsRR commented 1 week ago

The following code raises and IndexError.

  paths_list = [
      ("d","b","c"),
      ("a","b","c"),
      ("a","b","e"),
      ("d","b","e"),
      ("a",)
      ]
  frequencies = [
      1,
      20,
      1,
      20,
      1
      ]
  mapping = IndexMap()
  mapping.add_ids(np.unique(np.hstack(paths_list)))
  pathdata = PathData(mapping)
  pathdata.append_walks(node_seqs=paths_list, weights=frequencies)
  max_order = 3
  mon = MultiOrderModel.from_PathData(pathdata, max_order=max_order)
  mon.estimate_order(
      pathdata,
      max_order=max_order
      )

The issue is generated by the fact that paths 'shrink' when encoded through higher-order nodes, and this is not tackled correctly by (IndexError occurs if a path shorter than the tested order is added last. A problem in the computation of the optimal order could also emerges if the shortest paths in dataset is shorter than the tested order) has to account for it correctly.