netcreateorg / netcreate-2018

Please report bugs, problems, ideas in the project Issues page: https://github.com/netcreateorg/netcreate-2018/issues
Other
11 stars 2 forks source link

Gephi Export/Import Format #190

Open benloh opened 2 years ago

benloh commented 2 years ago

Kalani provided this sample Gephi node + edge export file.

NodeEdgeGephiImport.xlsx

benloh commented 2 years ago

@kalanicraig @jdanish Thanks for sending the files. I have some questions about the file/format (keeping it here in GitHub so we have a record).

I'm trying to reconcile our existing NetCreate data format with the Gephi format and trying to decide what, if anything we should change, with an eye towards future expansion and flexibility.

Elsewhere you noted:

Gephi node tables require:

Gephi’s edge table requires:

Documented here: https://github.com/netcreateorg/netcreate-2018/pull/179#issuecomment-1014874197

This kind of suggests that there is a base file format that can be augmented by any number of fields. The other example csv files you sent seem to confirm this.

I think when we had started NetCreate, we were working off of a format exported from that old Google network diagram app (I can't remember the name). We ended up baking some assumptions about the data format into the tool itself based on that data format. But in reviewing the Gephi files and given your "requiremens" listed above, I think we might want to revisit those assumptions.

I know in the past we had talked about adding the ability to add arbitrary data fields. And while our basic data format somewhat supports this (we use an attributes designator to group arbitrary fields), it is not currently supported by the UI.

The need to rework the data structures for exporting, importing, and template editing suggests that this might be the ideal time to at least lay the groundwork for supporting arbitrary fields.

Proposed Data Field Types

There are three types of fields:

  1. Required -- All nodes and edges must have these fields.

  2. Built-in Support -- Optional fields that require integration with NetCreate application.

  3. Arbitrary -- Optional fields with generic NetCreate support

1. Required

All nodes must have:

All edges must have:

2. Built-in Support

These fields are optional, but their implementation requires API support.

3. Arbitrary

These fields are optional and can make use of simple type validation.

Even if we don't fully implement this, implementing import, export, and template editing requires at least some of these modifications.

Any thoughts on this? Did I interpret the data format wrong? Does this feel like an overreach?

kalanicraig commented 2 years ago

I don’t think it’s overreach at all! The required and built-in, fields outlined here all track with what I would start for a new, very basic network, and the arbitrary fields suit what I would need as a humanist in order to tie additional non-network data to the node/edge data.

Google Fusion Tables handled network data a little differently, with a greater focus on labels and alphanumeric than on numeric-ID governed relationships between tables.

Only one issue that I can see cropping up based on these notes here: some of the arbitrary data types might take several forms. Not all numbers will be integers. Some will be decimals. For instance, a user might want latitude and longitude attributes in separate fields rather than in a user-Entered text string (which is how I’ve done lat/long up to now). That said, I’m not sure the average user will know the difference (or want to differentiate) between integer and number so we should figure out how to deal with display and validation with that in mind. I also imagine the user focus (or lack thereof) on the differences between integer/decimal and date/datetime would be similar and lead to similar validation issues for us. (I do feel strongly about ISO datetime formats for display and storage tho; YYYY-MM-DD for the win).

On Jan 19, 2022, at 7:37 PM, benloh @.***> wrote:

 @kalanicraig @jdanish Thanks for sending the files. I have some questions about the file/format (keeping it here in GitHub so we have a record).

I'm trying to reconcile our existing NetCreate data format with the Gephi format and trying to decide what, if anything we should change, with an eye towards future expansion and flexibility.

Elsewhere you noted:

Gephi node tables require:

ID: numeric only Label: Any Gephi’s edge table requires:

Source: numeric ID from node table Target: numeric ID from node table Gephi prefers a “Type” column in edge import that is “Directed” or “Undirected” but there’s a batch setting in the import process itself that supports users in choosing directed/undirected Documented here: #179 (comment)

This kind of suggests that there is a base file format that can be augmented by any number of fields. The other example csv files you sent seem to confirm this.

I think when we had started NetCreate, we were working off of a format exported from that old Google network diagram app (I can't remember the name). We ended up baking some assumptions about the data format into the tool itself based on that data format. But in reviewing the Gephi files and given your "requiremens" listed above, I think we might want to revisit those assumptions.

I know in the past we had talked about adding the ability to add arbitrary data fields. And while our basic data format somewhat supports this (we use an attributes designator to group arbitrary fields), it is not currently supported by the UI.

The need to rework the data structures for exporting, importing, and template editing suggests that this might be the ideal time to at least lay the groundwork for supporting arbitrary fields.

Proposed Data Field Types

There are three types of fields:

Required -- All nodes and edges must have these fields.

Built-in Support -- Optional fields that require integration with NetCreate application.

Arbitrary -- Optional fields with generic NetCreate support

  1. Required

All nodes must have:

ID number Label string All edges must have:

ID number Source number Target number

  1. Built-in Support

These fields are optional, but their implementation requires API support.

NodeType string -- supports enumeration of types, supports color definition EdgeType string -- supports enumeration of types, supports color definition (future?) Degree number -- aka "Weight", requires NetCreate to calculate and store values

  1. Arbitrary

These fields are optional and can make use of simple type validation.

String Number Date Boolean Even if we don't fully implement this, implementing import, export, and template editing requires at least some of these modifications.

Any thoughts on this? Did I interpret the data format wrong? Does this feel like an overreach?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

benloh commented 2 years ago

We can and probably will differentiate between the data type (e.g. 'number') and validation (e.g. integer vs float vs date). Javascript lets us treat all those as a number type and then display and validation can be handled separately in the Template specification.