ysig / GraKeL

A scikit-learn compatible library for graph kernels
https://ysig.github.io/GraKeL/
Other
601 stars 97 forks source link

Fingerprint dataset not accessible using fetch_dataset function #80

Closed amanuelanteneh closed 2 years ago

amanuelanteneh commented 2 years ago

Describe the bug When using fetch_dataset to retrieve the fingerprint dataset I receive the following error: [Errno 2] No such file or directory: './FINGERPRINT/FINGERPRINT_graph_indicator.txt'

To Reproduce Steps to reproduce the behavior:

from grakel.datasets import fetch_dataset
dataset = fetch_dataset("FINGERPRINT", verbose=False)

Expected behavior The expected behavior is to retrieve the dateset

Stack Trace FileNotFoundError Traceback (most recent call last)

in 5 from sklearn.svm import SVC 6 ----> 7 dataset = fetch_dataset("FINGERPRINT", verbose=False) 8 9 G = dataset.data

1 frames

/usr/local/lib/python3.7/dist-packages/grakel/datasets/base.py in read_data(name, with_classes, prefer_attr_nodes, prefer_attr_edges, produce_labels_nodes, as_graphs, is_symmetric) 263 264 # Associate graphs nodes with indexes --> 265 with open(indicator_path, "r") as f: 266 for (i, line) in enumerate(f, 1): 267 ngc[i] = int(line[:-1])

ysig commented 2 years ago

this means that either the dataset is not correctly downloaded or that there is no graph_indicator can you check?

amanuelanteneh commented 2 years ago

this means that either the dataset is not correctly downloaded or that there is no graph_indicator can you check?

Hi @ysig it appears to download a .zip file named "FINGERPRINT.zip" along with folder called "Fingerprint". I believe the graph indicators are there but the function is trying to access the file "FINGERPRINT_graph_indicator.txt" but the file name is actually "Fingerprint_graph_indicator.txt"

amanuelanteneh commented 2 years ago

Also do the adjacency matrices of the graphs returned by fetch_dataset() have edge weights or are they unweighted (only have 0 and 1 as matrix elements).

giannisnik commented 2 years ago

The datasets the fetch_dataset() function has access to contain unweighted graphs. Thus, the function just returns the set of edges of those graphs and not the adjacency matrices.

amanuelanteneh commented 2 years ago

@giannisnik however when calling fetch_dataset() if we set the flag as_graphs=True we can still get the adjacency matrices by calling get_adjacency_matrix() on the grakel graph objects it returns, correct?

giannisnik commented 2 years ago

Still the get_adjacency_matrix() function constructs the adjacency matrix from the set of edges of those graphs, thus the emerging matrices correspond to unweighted graphs.

amanuelanteneh commented 2 years ago

@giannisnik thank you for the clarification. Do you have a suggestion on how to get around the issue with the fingerprint dataset? I believe the issue comes from the fact that the .zip file name is FINGERPRINT in all caps but the graph indicator file name begins with Fingerprint when the fetch_dataset() function expects it to begin with FINGERPRINT.

giannisnik commented 2 years ago

Perhaps, the easiest workaround would be to unzip the file, rename all the files contained in it (replace Fingerprint with FINGERPRINT), then re-compress the folder that contains the files, and use this zip file instead of the original.

amanuelanteneh commented 2 years ago

@giannisnik I actually tried that and receive the following error:

File "Bin-GBS-Kernel.py", line 32, in preprocessDataset
    graphData = fetch_dataset(dataset, verbose=False, as_graphs=True) # get dataset 
  File "/home/asa2rc/.local/lib/python3.8/site-packages/grakel/datasets/base.py", line 482, in fetch_dataset
    data = read_data(name,
  File "/home/asa2rc/.local/lib/python3.8/site-packages/grakel/datasets/base.py", line 336, in read_data
    Gs.append(Graph(Graphs[i], node_labels[i], edge_labels[i]))
KeyError: 4
giannisnik commented 2 years ago

You can use the following function:

import networkx as nx
import numpy as np

from grakel.utils import graph_from_networkx

def load_fingerprint(path_to_folder):
    node2graph = {}
    Gs = []

    with open(path_to_folder+"/Fingerprint_graph_indicator.txt", "r") as f:
        c = 1
        for line in f:
            node2graph[c] = int(line[:-1])
            if not node2graph[c] == len(Gs):
                Gs.append(nx.Graph())
            Gs[-1].add_node(c)
            c += 1

    with open(path_to_folder+"/Fingerprint_A.txt", "r") as f:
        for line in f:
            edge = line[:-1].split(",")
            edge[1] = edge[1].replace(" ", "")
            Gs[node2graph[int(edge[0])]-1].add_edge(int(edge[0]), int(edge[1]))

    with open(path_to_folder+"/Fingerprint_node_attributes.txt", "r") as f:
        c = 1
        for line in f:
            Gs[node2graph[c]-1].nodes[c]['attributes'] = np.array(line.split(','), dtype=float)
            c += 1

    labels = []
    with open(path_to_folder+"/Fingerprint_graph_labels.txt", "r") as f:
        for line in f:
            labels.append(int(line[:-1]))

    labels  = np.array(labels, dtype=float)
    return Gs, labels

Gs, labels = load_fingerprint('./Fingerprint')
Gs_grakel = graph_from_networkx(Gs, node_labels_tag='attributes')
amanuelanteneh commented 2 years ago

@giannisnik Thanks for the code! However I noticed graph_from_networkx() returns a generator object. Is there a way to have it return an iterable/list of grakel Graph objects? I've tried setting as_Graph=True but that still seems to return a generator object.

giannisnik commented 2 years ago

You can simply cast it to a list as follows: Gs_grakel = list(graph_from_networkx(Gs, node_labels_tag='attributes'))

amanuelanteneh commented 2 years ago

@giannisnik Thanks for the explanation!