sknetwork-team / scikit-network

Graph Algorithms
Other
601 stars 67 forks source link

Graph format #445

Closed moissinac closed 3 years ago

moissinac commented 3 years ago

Description

I'm trying to load an RDF graph. I have a file in a TSV format with separator ' ' (in fact, an NT file - n-triples) Each line on the model <http://fr.dbpedia.org/resource/Walter_Scott> <http://dbpedia.org/ontology/wikiPageWikiLink> <http://fr.dbpedia.org/resource/Antiquaire> . or <http://fr.dbpedia.org/resource/Walter_Scott> <http://www.w3.org/2000/01/rdf-schema#label> "Walter Scott"@fr .

What I Did

graph = skn.data.load_edge_list(file="ctxgraphPM.nt", directed=True, named=True, delimiter=" " )`

But, the data part of the documentation doesn't explain what is supposed to be in the TSV file for the load_edge_list method.

Could you point me to the good part of the documentation or complete it here. Thank's in advance

QLutz commented 3 years ago

Hello,

The documentation is indeed quite lacking in this regard. I have updated it. It will be included in the next release.

I am not familiar with the RDF format but it seems it would be difficult to use this function in particular. The subject property object format indeed does not fit the node1 node2 [weight] format of an edge list. Also, weights need to be either integer or float values. On way to circumvent this particular issue would be to map each property to a distinct integer value and use those values in the adjacency, although this would call for extra attention when interpreting the results of most algorithms.

If you think the RDF format can be of use to others, do not hesitate to make a pull request out of any parser you'd write though one should note that scikit-network was not designed for knowledge graphs.

QLutz commented 3 years ago

On a closer look, one way to use the RDF format in scikit-network would be to generate one graph for each property value.

Would this be of any interest to you?

moissinac commented 3 years ago

Thank you for the reply. How the weight must be interpreted? High weight=high distance between nodes? or High weight= strong link between nodes? or something else?

tbonald commented 3 years ago

Yes, high weight = strong link between nodes. Thanks for the comment, this needs to be made explicit in the documentation.