benloh commented 2 years ago

Branch: dev-bl/import

Overview

This adds an Import Data feature to Net.Create.

This is a pretty significant feature that ended up touching ALL aspects of the app. A very thorough QA cycle will be needed.

To Test

To test, first export nodes and edges from an existing graph. (NOTE please create new exports. Do not use exported csv files created previously from the dev-bl/export branch, there were errors in the previous export routine.)

git fetch && git checkout dev-bl/import
./nc.js --dataset=yourdataset
Select "More..."
Select "Import/Export"
Click "Export Nodes"
Click "Export Edges"

Next, try modifying the labels of the exported nodes, and reimport them.

Open the exported nodes csv.
Change the label to something else.
Select "More..."
Select "Import/Export"
Click "Choose File" for nodes.
The "Import" button should appear if the files are valid.
Click "Import"

Next, create a new graph and try importing the nodes and edges.

Open the exported nodes csv file.
Change all the "ids" to "new"
ctrl-c to quit Net.Create.
./nc.js --dataset=empty
Select "More..."
Select "Import/Export"
Click "Choose File" for nodes.
The "Import" button should appear if the files are valid.
Click "Import"
Your original dataset should be recreated

Other things to test:

Try adding new nodes/edges to an existing graph. Any matching ids will be replaced, any ids makred "new" will be newly created.
Try adding ONLY nodes, or ONLY edges. You can add them in any order one at a time.
Open Net.Create remotely. You should not be able to import data.
Open Net.Create remotely with ?admin=true. You should be able to import data.
Edit your existing template, add and set allowLoggedInUserToImport to true. Open Net.Create remotely without admin priviledges. Import is disabled. Log in. Import is now enabled.

Test error checking:

Try importing nodes with bad ids
Try importing edges with bad ids
Try importing edges that link to bad node ids
Try importing nodes/edges that have a missing header field
Try importing nodes/edges that have missing fields in the rows
Try importing nodes/edges that have linefeed characters in side of text
Try import nodes/edges that have other unexpected characters -- please document any bugs that emerge.

Import Feature

Net.Create can import nodes and edges via separate .csv files.

The easiest way to set up a CSV file for import is to first export a few nodes/edges from your existing project. They key is to set up the Template first with the appropriate headers. Then you can export and modify the csv files, then reimport them.

For both nodes and edges, you should be able to:

Add new nodes/edges
Partially replace existing nodes/edges
Partially replace existing nodes/edges AND add new nodes/edges in the same file

Replacing Existing Nodes and Edges

During an import, Net.Create uses node and edge ids to match imported data to existing data.

If the id does not match an existing node or edge id, the app will output an error message listing the problem id and the row of the id. "row" refers to the line number in the csv file. Line 1 is the header. Line 2 would be the first data row.

Adding New Nodes/Edges

To add a new node or edge, you need to use an id of "new". For example, to add "Tacitus" and "Granicus", you would define an import csv file with two rows where the ID is set to "new".

import_node.csv

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
new,"Tacitus",
new,"Granicus"

You can mix "new" and replacements, e.g. this will replace existing node 1 with "Claudius", and add "Tacitus" and "Granicus".

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
1, "Claudius",
new,"Tacitus",
new,"Granicus"

The "new" keyword is case-insensitive. e.g. "NEW", "New", "new", and "nEw" all work.

Defining Import Fields

The fields required by the graph for import are defined by the template. Currently any fields that are defined in the template and NOT HIDDEN will be required when you import. E.g. if you define 6 fields in the template, then the import file MUST have 6 fields with headers that match the exportLabel fields defined in the template, or the importer will complain about missing fields.

Any fields that are marked in the template as hidden will not be exported, nor will they be required for import.

While all non-hidden headers are required in the csv file, you can skip fields in the node/edge data rows. For example, when importing a node, you can just specify "id" and "label" and skip the other fields, e.g.:

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
1, "Claudius",
new,"Tacitus",,"He's the man!"

It's possible we might want to relax this and just require only the main fields (e.g. id and label with nodes, source target and id with edges). This will give you more flexibility when importing. On the other hand, you can also leave the fields empty, so long as you have the right headers in the csv.

The fields required in the import/export headers are defined via the exportLabel property in templates. The exportLabel will map the headers to built-in Net.Create fields.

You can rename exportLabel fields to match your the fields required/used by your external graph application. For example, if your graph application expects type to be labeled as OPTIONS, you can set the node type exportLabel to OPTIONS. The labels are case sensitive.

Encoding / Special Characters

Since we're using .csv as the export/import data format, there are a few considerations for encoding:

Carriage returns are allowed inside of quotes.
Double Quotes need to be encoded. If you need to use double quotes, use two of them next to each other. Excel should automatically encode quotes as a double quote when exporting to a CSV file. NOTE: An extraneous double quote will probably generate a bad header error during import. Here's an example of a valid and invalid use of quotes:

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated

// valid -- note the correct use of two "" around Egyptians
7,"Alexandria","Place","Alexandria is one of two places where ""Egyptians"" introduced the plague.","","",""

// invalid -- the single quote around Egyptians will cause import to fail, usually with a bad header message
7,"Alexandria","Place","Alexandria is one of two places where "Egyptians" introduced the plague.","","",""

There are probably other exceptions that we'll need to add validation for, especially control characters.

Error Checking

The app provides two levels of validation during import. When you select a file for import, the system will:

Check the file's headers to make sure they match the expected headers as defined in the template. If they do not, an error message is displayed.
Read the data and run simple validation on ids. If it encounters an error, it will display the errors. You can then fix and/or select a new file to import.

Errors Caught:

Headers in import csv do not match headers defined in template. (Hide or remove the field in the template to make it not required for import)
Node or edge uses an invalid id
Edge refers to nonexistent source or target node ids

Errors Not Caught:

If your data values do not match up to the header values, the system will blindly import the values and you might end up with data matched to the wrong header/field.
The row number that an error is reported on can be thrown off if there are carriage returns in quoted text.

Troubleshooting

Oftentimes, bad encoding errors (e.g. mismatched double quotes, extraneous commas, extraneous carriage returns) will result in a Header error. If you see an error about mismatched headers and you know your headers are...

well defined
match the non-hidden headers in the template ...then you might try the following to troubleshoot:

Open the csv in a text editor or Excel to make sure there aren't stray characters.
Open the csv in Excel to make sure each record gets is own row. If not, then you might have some stray carriage returns (usually appearing outside of quoted text).
Try deleting everything but the header and one line of data and see if that imports. If that imports, then the culprit is somewhere within your data encoding.

Import Report

After importing data, the app will display a list of nodes/edges that have been replaced as well as a count of all the nodes/edges that were added or replaced.

Import Permissions

Roles: Admins vs Non-admins

Admins are always allowed to import data.

Non-admin users are normally not allowed to import data. This pull request adds a new option to the Templates to allow logged in users to import data. If the template setting allowLoggedInUserToImport is set to true, then logged in users will be able to import data. By default allowLoggedInUserToImport is false.

Edit States

Since importing modifies the database, during an import, editing the Template and editing individual Nodes and Edges is locked out. Conversely, if someone is editing a Node, Edge, or Template, Importing is locked out. This prevents accidental overwriting of data.

If you navigate away from the Import panel, the import is cancelled. This might be a little surprising and awkward so we might want to revisit this. But there wasn't another clear way to "Cancel" the import lockout.

Standalone Mode

Importing is also disabled in standalone mode, since you are not allowed to modify the database.

Import Backups

Every time you click the "Import" button to import data (nodes or edges), the Net.Create server will make a backup of the current database into the runtime/backups folder before executing the import. The backup file will be named the same as the open database file with a timestamp appended.

If you are running this on a server or using nc-multiplex you'll want to periodically monitor the runtime/backups folder to make sure it does not grow too large. You'll want to periodically clear out the backup files.

If you need to restore a backup, you can copy it to runtime/ and just open it directly via ./nc.js --dataset=xxx call, or rename the database file back to the original name, copy it over the original in runtime/ and open that via ./nc.js.

Admin Tools

"Force Unlock All"

Template editing, importing data, and node and edge editing are all mutually exclusive actions: If someone on the network is doing one of those activities, others are prevented from doing the same. (The one exception is that if you are editing a node or edge, others are only prevented from editing the same node or edge and editing the template or importing, but they can still edit other nodes and edges.) When the edit/import is complete the lock on editing should be released.

Every once in a while, the release message is lost and the edit lock remains in place. If this happens, an administrator can go to the More > Import / Export panel and click the "Force Unlock All" button to release the edit lock and re-enable template editing, importing, and node and edge editing.

WARNING: Use this with utmost caution! If someone is actively editing or importing, you can delete their work, or even worse, corrupt the database!

Other Changes

`revision` update bug

There was a bug where the revision field was either not updated at all, or was being updated only once per session (after a reload), instead of being updated with every database update. It now properly updates every time you edit and save a node or edge.

TOML template file update

The default toml template and schema have been updated. You'll want to review or update your existing template files.

"weight"-ready

While the UI (EdgeEditor) does not currently support it, the app logic now calculates edge line sizes by summing up edge weight values. e.g. if two nodes are connected by three edges with a weight of 1, 2, and 4, the size of the edge will be 6.

weight defaults to 1. It can support values smaller than 1.

weight is not currently settable via the EdgeEditor UI. EdgeTable does not yet display weight either. It is also not saved in the database.

Optimization

We've done a little bit of optimizing the d3 render loop -- node sizes are now calculated before the render and done more efficiently.

Data Model Refinement

This is mostly under the hood stuff, but we have now more formally separated the raw network data from the rendered d3 data. This should make it easier to do future updates.

Force Updates

You might notice graphs look very different. With the refinement of the data model, we updated the way d3 is rendering forces. Hopefully this is an improvement, but we will probably have to do some exploration to make sure all graphs look better.

Node Table ID Sorting

In debug mode, NodeTables show IDs. You can now sort by the ID.

benloh commented 2 years ago

@jdanish @kalanicraig I think import is ready for testing now. I'm still doing more QA, but it should be mostly stable.

Questions

Adding New Nodes Right now, since we do node id checking, you can only add new nodes provided you use a new unique id. We may want to make it easy to add new nodes. e.g. either use a missing id to automatically add a new node with a new id, or use some kind of id designator like new to add it?
Streamlining validation and error checking When you first select a node/edge file for import, it is initially scanned to make sure the headers are all defined. It does not review the row data in the file. The row data is only reviewed after you press "Import". If there are errors during import, these are then displayed, and you are given the option of selecting a new file. Instead of waiting for you to click "Import" to process the import errors, we can just process the errors as soon as you select the file. This requires a little more work on the UI side (maybe another 2 - 4 hours) but it might be a better user experience. We'll punt on this for now unless you think this is a high priority.
Edge weights Even though we don't support it yet in the UI, it should be relatively easy to add a weight property to edges. The system is all ready to do weight calculations (e.g. the width of the edge is based on the sum of all the edges between two nodes). It's probably another 4 - 8 hours of work to enable this. But I think it's probably a lower priority than some of your other requests.

benloh commented 2 years ago

moved

kalanicraig commented 2 years ago

Hi! So:

We should be explicit about what that ID field does, which suggests that a new line in the import should have a NEW flag in the ID field.
This seems to me to be cosmetic. Let's hold it until the end, as long as it's clear which rows are the problems in the error output so we don't have to puzzle over which data are the issue.
This is definitely a lower priority. Edge weight can be done outside of Net.Create if necessary and will be recalculated as soon as someone adds/deletes an edge, so it's more for post-data-entry display in the standalone mode than anything.

I can test some import this morning. We have several networks going this semester that will let us play with data. THANK YOU!!!

On Thu, Mar 10, 2022 at 1:23 AM benloh @.***> wrote:

To Do

Nodes are importing with bad dates

Verify nc-multiplex still works

Verify standalone mode still works

— Reply to this email directly, view it on GitHub https://github.com/netcreateorg/netcreate-2018/pull/217#issuecomment-1063706356, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKL4NGB4ZSPOJXSFB7E7YTU7GIMXANCNFSM5PY7H5YA . You are receiving this because you were mentioned.Message ID: @.***>

jdanish commented 2 years ago

Just so we are clear on number 1: that means if I export a spreadsheet from say a survey and want to import it, I'll need to go in and add the new flag to each row to get them read in. That's fine, but introduces a step. I am just making sure we agree that is the goal? Long-term I supposed we could add an option to assume if the DB is empty, all rows get a new flag appended (but ignore for now).

Agree, let's punt on it, but can you add an issue with the improved user experience so that we have a record?

Agree on 3.

kalanicraig commented 2 years ago

Correct, the goal is to require that users put a "NEW" flag in each new row. One: it's better data-science training because we can be explicit about what the ID field does and why we want to label something "NEW" vs leave it blank. Two: it's easy enough to document.

On Thu, Mar 10, 2022 at 9:38 AM Joshua Danish @.***> wrote:

Just so we are clear on number 1: that means if I export a spreadsheet from say a survey and want to import it, I'll need to go in and add the new flag to each row to get them read in. That's fine, but introduces a step. I am just making sure we agree that is the goal? Long-term I supposed we could add an option to assume if the DB is empty, all rows get a new flag appended (but ignore for now).

Agree, let's punt on it, but can you add an issue with the improved user experience so that we have a record?

Agree on 3.

— Reply to this email directly, view it on GitHub https://github.com/netcreateorg/netcreate-2018/pull/217#issuecomment-1064132134, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKL4NCGFU7MYRS3Z7UM5PTU7ICN3ANCNFSM5PY7H5YA . You are receiving this because you were mentioned.Message ID: @.***>

benloh commented 2 years ago

OK thanks for the input. It's easy enough to automatically insert a new record if the ID is blank and to add that as a template option. But we can punt for now.

Just so you know: there are some issues with dates exporting and importing that I'm cleaning up right now.

benloh commented 2 years ago

@kalanicraig @jdanish I just pushed some changes that adds support for the "new" id field. Now when you import a file:

Any node with a matching id will be replaced
Any node with a id of "new" will be newly created
Any node with an unrecognized id will be flagged as an error and the import will be skipped

It's necessary to reject unrecognized ids because it would otherwise screw up the database index system. We could add support for adding arbitrary ids if necessary, but to me that sounds like a recipe for chaos.

benloh commented 2 years ago

One more update: Hidden fields are no longer required for import. Also, encoded double quotes ("") are now properly decoded as a single quote (").

benloh commented 2 years ago

@jdanish Saturday update: Import now locks out Node/Edge/Template editing and vice versa.

jdanish commented 2 years ago

Tried exporting the nodes with the new code, and then immediately re-imported without even editing the file. Also tried after editing one entry. In both cases I get a blank screen and the following error. Any ideas?

benloh commented 2 years ago

If you can get me the line number associated with the in p (created by NodeSelector) or the .loki file you're using that might help.

One issue I haven't mentioned: The template file has changed significantly. So if you're working with old data, you'll probably need to review the process. e.g. if you had previously converted the json template to toml with earlier tests (using the auto-conversion when you open an existing db), you may need to trash the toml file and go through the autoconversion again. I had to keep adding shtuff to the template to make everything work.

jdanish commented 2 years ago

Loki file, template, and the csv that was exported just in case.

Archive.zip VFOI1_nodes.csv

benloh commented 2 years ago

Thanks, there's definitely something odd about the file. The embedded video doesn't help, but there's something else going on too. Still looking into it...

benloh commented 2 years ago

@jdanish Wow, I'm a little surprised that the embedded video actually seems to import. So there were two issues that I've hopefully fixed (having to do with some improper handling of headers and templates). But here's what you might have to do to get it to import:

Delete the VFOI1.template.toml file. Let the system create a new one.
Ctrl-c to quit the app.
npm... to start it again.
Edit the Current Template and make "Info" hidden so that it is not required on import.
Save the Template change
Import the VF0I1_nodes.csv file. It should come in fine now.

Sorry 'bout that. But this is a pretty good test case.

benloh commented 2 years ago

BTW, after playing with exporting and importing, I suspect we'll want to ease up on requirement to use "new" for all node imports, if only so matching edges is more manageable. But you guys should play with it and see what you think.

e.g. I can export VF0I1 nodes and edges, then reimport nodes into a blank doc, but then edges can't be re-imported because all the ids will have changed. That may or may not be a problem depending on your goals.

jdanish commented 2 years ago

OK, seems to be much better. A few notes / questions:

1) I was nervous about re-doing the template, so simply deleted the export. It seems like with the fix in place, re-exporting, and then importing works fine. So that's ideal.

2) The error text on import appears correctly, but looks like this:

3) When I tried hiding the info field, and then re-importing it, it erased the info field in the existing database. Is that the desired outcome @kalanicraig ? Or would we want to leave it there? I can't imagine I'd hide a field and import and not export that, but it surprised me and sent me scrambling for the backup.

4) Just a note that the video isn't a video, but an image linked in using the markdown. This is actually a common use-case for kids in that we simply did a google image search and dropped the link in. It seems to work so may not be an issue, but is there something we should watch for? Assuming the markdown escapes OK I assume we'll be fine there?

5) I had a moment of surprises (thought it was a bug) that when I imported the node it didn't update the "updated" field to the current day / time (because it used the data). I think that's fine, though it makes me wonder whether we want an option where it is blank and it uses the system date / time similar to an edit?

6) Testing 5 helped me notice this that the nodes table appears to be displaying the created date not the last updated under updated. Note: this only happens after import, so I wonder if created is overwriting updated?

7) It seems like subsequent imports do not re-read the file, as if it is cached / stuck. Easiest way to see this: export nodes, go to import nodes, but "accidentally" import it as an edges file. It gives the correct errors. Then click import nodes... errors stay the same. You need to kill and re-start the server to fix it.

8) Re: Ben's prior comment. I agree, we want to be able to easily import nodes and edges. I can't recall - is there a reason not to assume that importing a number that is not new will simply assign the number? (I assume that's the alternative) I imagine the problem is a case where you over-write, but that is reported and thus fine?

jdanish commented 2 years ago

This makes me wonder: maybe we should trigger a backup of the old db on import so that we have a backup just in case?

jdanish commented 2 years ago

Just noticed that the edges table lists source twice now, and the 2nd is empty.

jdanish commented 2 years ago

How do we see / test this? Not sure how to go into "debug mode":

In debug mode, NodeTables show IDs. You can now sort by the ID.

jdanish commented 2 years ago

Hey @benloh, @kalanicraig and I discussed and vote that the "new" requirement be relaxed so that nodes and edges can be imported as suggested. Our suggested modification is that if the highest id in use can be reported on the import / export tab, it'll help folks know what number range is safe to import without overwriting things. What do you think?

benloh commented 2 years ago

Sorry...every time I think I'm almost done, another variant comes up. I'm slowly chasing them down.

benloh commented 2 years ago

@jdanish Some replies:

The error text on import appears correctly, but looks like this:

This is one of the things I fixed. Your original VFOI1.template.toml file had a bad importIsLocked message definition due to my buggy import prior to March 19th. This is why I asked you delete the toml. If you need to keep it, what you need to do is remove this from the end of the original toml file:

Lines 187-190

[importIsLockedMessage]
type = "string"
description = "Warning message to display if user is trying to edit a node, edge, or template import is active."
default = "Data is currently being imported, please try again later."

When I tried hiding the info field, and then re-importing it...

Yeah, i think my current approach is probably bad. We're conflating hidden / hiding a field with determining whether or not it is required for export or import. It shouldn't erase the field from the actual database file, but it does hide it from all visible UI (e.g. node/edge editors and tables), and since during import the field is missing, the imported fields will have empty values for those fields.

I was trying to keep things simple (there are already so many damn parameters for each field), but I think I need to add a new parameter requiredForImportExport flag to determine whether or not the data is exported and required for import, separate from hidden, which is more UI oriented.

And then we probably also need to be careful when importing to make sure that we don't clobber existing values if the field is not required for export.

Just a note that the video isn't a video, but an image linked in using the markdown.

I was going off the "description". This is definitely something to watch out for, since you could end up with a multiple gigabyte image. I'm not sure what effect it will have on the db overall, and I haven't done super careful testing of our import/export parsing to make sure all encoded binary data is kosher. You may very well end up with issues due to quote or carriage return encoding.

I didn't realize that the markdown editor will do the encoding for you -- I had assumed you were using links to external sites.

imported the node it didn't update the "updated" field to the current day / time

This is a function of the loki database. If the imported field has updated data, then we simply use that data. We don't do a separate update to match the date/time of the import.

But loki is also a little weird. So for instance, I believe even if you don't import "Last Updated", loki will not record an update timestamp if it is inserting a record for the first time. So a new record will never have an updated value, nor a revision value. If you import an existing record that has a "Last Updated" value in the import data, loki will just use that imported value. Theoretically if you import an existing record without the "Last Updated", loki ought to add an updated value timestamped with the import. But I haven't tested this carefully to make sure that's the case.

If we separate the 'hidden' from 'requiredForImportExport' then maybe we can handle this situation...keeping in mind that unless we override loki's default 'updated' behavior, you will never get an 'updated' value with the initial record creation.

Testing 5 helped me notice this that the nodes table appears to be displaying the created date not the last updated under updated

This is due to the loki behavior I noted above. 'updated' isn't set with the initial field creation, or if we're importing the "Last Updated" field. That's why NodeTable falls back to use the creation date if revision < 1.

It seems like subsequent imports do not re-read the file...

Actually this should have been one of the many many UI issues I had carefully fixed. If this is still happening, can you please provide me with detailed example files and steps to reproduce? Ideally in a different issue.

Re: Ben's prior comment. I agree, we want to be able to easily import nodes and edges....

Yeah the previous behavior was that I would blindly insert new node/edges with the new ids. And yes, if it replaced an existing node you would get a message. The danger here was that the internal counter that kept track of the next available ID was not being updated. We'll just have to add that.

I think the issue here is as much social engineering/sound data processing practices as anything.

maybe we should trigger a backup of the old db on import so that we have a backup just in case?

Most definitively.

the edges table lists source twice now, and the 2nd is empty.

This is your template. You have citations field using a displayLabel of Source.

In debug mode, NodeTables show IDs. You can now sort by the ID. / How do we see / test this? Not sure how to go into "debug mode":

This is only for developers. In NodeTable.jsx line 28, change it to:

var DBG = true;

highest id in use can be reported on the import / export tab,

Sure that's worth a try.

benloh commented 2 years ago

moved

jdanish commented 2 years ago

Re the template: OK, I was able to fix that easily, thanks (number 2).

Re: number 3, OK if we can fit it in? If not I think it's fine as-is. Alternatively, if we want to go the other direction, we could just re-label it as used / not used instead of hidden. I don't think we are typically using hidden to temporarily hide something, but really just to remove it from a specific network use-case. I'm fine with either. @kalanicraig ?

Re 4: I hadn't realized either - I just searched and copied the url without looking at it. Easy enough to handle socially, though.

Re 5: I think this is fine as-is. Mostly noting it for me / Kalani / history. For now.

Re 6: This doesn't happen until an import, though. So are you saying we can set the created date but updated isn't getting set? I'm a bit confused?

Re 7: OK will try in a minute.

Re: the rest, cool. Thanks.

benloh commented 2 years ago

Re 6: This doesn't happen until an import, though. So are you saying we can set the created date but updated isn't getting set? I'm a bit confused?

I believe in this case what is happening is that the revision field was not defined in your template (probably due to earlier template conversions not accounting for a number of fields, including revision). Because revision data is missing, the NodeTable finds no revisions and falls back to using the creation field, even though LastUpdated is available.

So you are actually updating both the creation and last update field, but the table is falling back on creation because revision is missing. It's a little confusing because of all the interconnected pieces. Here's what you have to do:

Edit the exported nodes file: Add a Revision field and add 1 to each record.
Edit the Current Template a. Click on "Object Properties" next to "nodeDefs" b. Check the "revision" box to add it to the template c. Save the template
Re-import the nodes file.

This is an example of the type of semi-intelligent workarounds I've already had to build in all over the place as we did imports. I'm not gonna be able to do a workaround this round, but this is the type of thing we're going to want to improve as we play with the import workflow. I expect there are a lot more issues like this.

jdanish commented 2 years ago

Seems like creating new templates with the current code are key to my errors. Will try that and check back in a bit…

(from my iPhone)

Joshua Danish http://www.joshuadanish.com

On Mar 21, 2022, at 5:26 PM, benloh @.***> wrote:

Re 6: This doesn't happen until an import, though. So are you saying we can set the created date but updated isn't getting set? I'm a bit confused?

I believe in this case what is happening is that the revision field was not defined in your template (probably due to earlier template conversions not accounting for a number of fields, including revision). Because revision data is missing, the NodeTable finds no revisions and falls back to using the creation field, even though LastUpdated is available.

So you are actually updating both the creation and last update field, but the table is falling back on creation because revision is missing. It's a little confusing because of all the interconnected pieces. Here's what you have to do:

Edit the exported nodes file: Add a Revision field and add 1 to each record. Edit the Current Template a. Click on "Object Properties" next to "nodeDefs" b. Check the "revision" box to add it to the template c. Save the template Re-import the nodes file. This is an example of the type of semi-intelligent workarounds I've already had to build in all over the place as we did imports. I'm not gonna be able to do a workaround this round, but this is the type of thing we're going to want to improve as we play with the import workflow. I expect there are a lot more issues like this.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

benloh commented 2 years ago

@jdanish @kalanicraig I've completely rewritten the import UI and validation. Now as soon as you select a file, we do both header checking and id validation and report the results immediately. This has a number of implications:

if you are trying to import edges that reference novel node ids in a node import file, you will first have to select the node file to get it to load and validate, then the edges file. If you load the edges file first, the system will not find the novel node ids.
If you define new node ids using the "new" keyword, you will not be able to define edges that link them -- since you have no ids to define the links. I suppose in a future version we can allow linking by source/target labels, but that will add another layer of complexity to an already very complex import system.

Please give it a whirl! Hopefully it's an improvement.

jdanish commented 2 years ago

Initial reaction is "sweet!!"

Will bang on it, but looks good to me so far!!

benloh commented 2 years ago

EdgeTable sorting was broken due to NCDATA changes. Sorting on all fields should now work again.

benloh commented 2 years ago

To Do

[x] Nodes are importing with bad dates
[x] Reconcile _default.template.toml with template-schema.js -- auto-generate a template-schema.template.toml file on build? => netcreateorg/netcreate-2018#227
[x] Add help description to hidden checkbox when template editing
[x] Do not export parameters that are hidden
[x] Require explicit new ID to generate a new node or edge
[x] updated and revision are not quite working as expected for both node and edges -- disconnect between FILTEREDD3DATA and NCDATA?
[x] Allow requiring created, updated, and revision data for import and export.
[x] ~Add requiredForImportExport flag~ Moved to netcreateorg/netcreate-itest#32
[x] Backup db before import
[x] Allow importing new ids
[x] Update highest ID after import.
[x] Report highest ID on import/export tab.
[x] Node hover info is missing date info
[x] Edge table source/target shows IDs, not labels!
[x] Doc: Monitor backup/runtime folder! Esp when running on a server/nc-multiplex. Each import triggers a backup.
[x] EdgeFilter by source/target is broken because it's using id not label
[x] Verify nc-multiplex still works
[x] Verify standalone mode still works
[ ] Update wiki when merging

benloh commented 2 years ago

Edge filtering functionality is now restored. NCDATA changes resulted in filters operating on ids rather than labels.

benloh commented 2 years ago

QA

standalone mode works
nc-multiplex works
requiredForImportExport will be implemented next round -- implementing it properly gets very complex as we have to deal with pre-existing fields, field removal vs hiding, and arbitrary field import.

jdanish commented 2 years ago

Awesome. Just to be really clear, the comment about "requiredForImportExport" means that in the current version (until we get more funding), if you hide a field via template, it will not export or import. So, anything you want in the graph should be listed as visible for that process.

I think that's fine and mostly users treat them as identical for now, just want to make sure we know. Thanks!

benloh commented 2 years ago

Yeah. I had started to implement it last night, but as I sketched things out, I realized it was WAY to complicated if we want to properly handle all the cases. See netcreateorg/netcreate-itest#32

Think of "hidden" as a way to temporarily turn off fields that you might want to restore later. If you don't need a field at all, you can just not include them in the template.

benloh commented 2 years ago

Kalani wrote: I’ve now tested import/export, filtering, standalone and template changes on 4-5 different networks, including a Japanese-language network and a network with markup in the notes, and not had problems with any of them. I’d guess there are still some bugs floating around, but it’s probably time to merge into dev and also maybe to release nc-multiplex.

netcreateorg / netcreate-2018

Feature: Import Data #217

Overview

To Test

Import Feature

Replacing Existing Nodes and Edges

Adding New Nodes/Edges

Defining Import Fields

Encoding / Special Characters

Error Checking

Import Report

Import Permissions

Roles: Admins vs Non-admins

Edit States

Standalone Mode

Import Backups

Admin Tools

"Force Unlock All"

Other Changes

`revision` update bug

TOML template file update

"weight"-ready

Optimization

Data Model Refinement

Force Updates

Node Table ID Sorting

Questions

To Do

netcreateorg / netcreate-2018

Feature: Import Data #217

Overview

To Test

Import Feature

Replacing Existing Nodes and Edges

Adding New Nodes/Edges

Defining Import Fields

Encoding / Special Characters

Error Checking

Import Report

Import Permissions

Roles: Admins vs Non-admins

Edit States

Standalone Mode

Import Backups

Admin Tools

"Force Unlock All"

Other Changes

revision update bug

TOML template file update

"weight"-ready

Optimization

Data Model Refinement

Force Updates

Node Table ID Sorting

Questions

To Do

`revision` update bug