Closed benloh closed 2 years ago
@jdanish @kalanicraig I think import is ready for testing now. I'm still doing more QA, but it should be mostly stable.
Adding New Nodes
Right now, since we do node id checking, you can only add new nodes provided you use a new unique id.
We may want to make it easy to add new nodes. e.g. either use a missing id to automatically add a new node with a new id, or use some kind of id designator like new
to add it?
Streamlining validation and error checking When you first select a node/edge file for import, it is initially scanned to make sure the headers are all defined. It does not review the row data in the file. The row data is only reviewed after you press "Import". If there are errors during import, these are then displayed, and you are given the option of selecting a new file. Instead of waiting for you to click "Import" to process the import errors, we can just process the errors as soon as you select the file. This requires a little more work on the UI side (maybe another 2 - 4 hours) but it might be a better user experience. We'll punt on this for now unless you think this is a high priority.
Edge weights
Even though we don't support it yet in the UI, it should be relatively easy to add a weight
property to edges. The system is all ready to do weight calculations (e.g. the width of the edge is based on the sum of all the edges between two nodes). It's probably another 4 - 8 hours of work to enable this. But I think it's probably a lower priority than some of your other requests.
moved
Hi! So:
I can test some import this morning. We have several networks going this semester that will let us play with data. THANK YOU!!!
On Thu, Mar 10, 2022 at 1:23 AM benloh @.***> wrote:
To Do
- Nodes are importing with bad dates
- Verify nc-multiplex still works
- Verify standalone mode still works
— Reply to this email directly, view it on GitHub https://github.com/netcreateorg/netcreate-2018/pull/217#issuecomment-1063706356, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKL4NGB4ZSPOJXSFB7E7YTU7GIMXANCNFSM5PY7H5YA . You are receiving this because you were mentioned.Message ID: @.***>
Just so we are clear on number 1: that means if I export a spreadsheet from say a survey and want to import it, I'll need to go in and add the new flag to each row to get them read in. That's fine, but introduces a step. I am just making sure we agree that is the goal? Long-term I supposed we could add an option to assume if the DB is empty, all rows get a new flag appended (but ignore for now).
Agree, let's punt on it, but can you add an issue with the improved user experience so that we have a record?
Agree on 3.
Correct, the goal is to require that users put a "NEW" flag in each new row. One: it's better data-science training because we can be explicit about what the ID field does and why we want to label something "NEW" vs leave it blank. Two: it's easy enough to document.
On Thu, Mar 10, 2022 at 9:38 AM Joshua Danish @.***> wrote:
Just so we are clear on number 1: that means if I export a spreadsheet from say a survey and want to import it, I'll need to go in and add the new flag to each row to get them read in. That's fine, but introduces a step. I am just making sure we agree that is the goal? Long-term I supposed we could add an option to assume if the DB is empty, all rows get a new flag appended (but ignore for now).
Agree, let's punt on it, but can you add an issue with the improved user experience so that we have a record?
Agree on 3.
— Reply to this email directly, view it on GitHub https://github.com/netcreateorg/netcreate-2018/pull/217#issuecomment-1064132134, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKL4NCGFU7MYRS3Z7UM5PTU7ICN3ANCNFSM5PY7H5YA . You are receiving this because you were mentioned.Message ID: @.***>
OK thanks for the input. It's easy enough to automatically insert a new record if the ID is blank and to add that as a template option. But we can punt for now.
Just so you know: there are some issues with dates exporting and importing that I'm cleaning up right now.
@kalanicraig @jdanish I just pushed some changes that adds support for the "new" id field. Now when you import a file:
It's necessary to reject unrecognized ids because it would otherwise screw up the database index system. We could add support for adding arbitrary ids if necessary, but to me that sounds like a recipe for chaos.
One more update: Hidden fields are no longer required for import. Also, encoded double quotes ("") are now properly decoded as a single quote (").
@jdanish Saturday update: Import now locks out Node/Edge/Template editing and vice versa.
Tried exporting the nodes with the new code, and then immediately re-imported without even editing the file. Also tried after editing one entry. In both cases I get a blank screen and the following error. Any ideas?
If you can get me the line number associated with the in p (created by NodeSelector)
or the .loki file you're using that might help.
One issue I haven't mentioned: The template file has changed significantly. So if you're working with old data, you'll probably need to review the process. e.g. if you had previously converted the json template to toml with earlier tests (using the auto-conversion when you open an existing db), you may need to trash the toml file and go through the autoconversion again. I had to keep adding shtuff to the template to make everything work.
Loki file, template, and the csv that was exported just in case.
Thanks, there's definitely something odd about the file. The embedded video doesn't help, but there's something else going on too. Still looking into it...
@jdanish Wow, I'm a little surprised that the embedded video actually seems to import. So there were two issues that I've hopefully fixed (having to do with some improper handling of headers and templates). But here's what you might have to do to get it to import:
VFOI1.template.toml
file. Let the system create a new one.npm...
to start it again.VF0I1_nodes.csv
file. It should come in fine now.Sorry 'bout that. But this is a pretty good test case.
BTW, after playing with exporting and importing, I suspect we'll want to ease up on requirement to use "new" for all node imports, if only so matching edges is more manageable. But you guys should play with it and see what you think.
e.g. I can export VF0I1 nodes and edges, then reimport nodes into a blank doc, but then edges can't be re-imported because all the ids will have changed. That may or may not be a problem depending on your goals.
OK, seems to be much better. A few notes / questions:
1) I was nervous about re-doing the template, so simply deleted the export. It seems like with the fix in place, re-exporting, and then importing works fine. So that's ideal.
2) The error text on import appears correctly, but looks like this:
3) When I tried hiding the info field, and then re-importing it, it erased the info field in the existing database. Is that the desired outcome @kalanicraig ? Or would we want to leave it there? I can't imagine I'd hide a field and import and not export that, but it surprised me and sent me scrambling for the backup.
4) Just a note that the video isn't a video, but an image linked in using the markdown. This is actually a common use-case for kids in that we simply did a google image search and dropped the link in. It seems to work so may not be an issue, but is there something we should watch for? Assuming the markdown escapes OK I assume we'll be fine there?
5) I had a moment of surprises (thought it was a bug) that when I imported the node it didn't update the "updated" field to the current day / time (because it used the data). I think that's fine, though it makes me wonder whether we want an option where it is blank and it uses the system date / time similar to an edit?
6) Testing 5 helped me notice this that the nodes table appears to be displaying the created date not the last updated under updated. Note: this only happens after import, so I wonder if created is overwriting updated?
7) It seems like subsequent imports do not re-read the file, as if it is cached / stuck. Easiest way to see this: export nodes, go to import nodes, but "accidentally" import it as an edges file. It gives the correct errors. Then click import nodes... errors stay the same. You need to kill and re-start the server to fix it.
8) Re: Ben's prior comment. I agree, we want to be able to easily import nodes and edges. I can't recall - is there a reason not to assume that importing a number that is not new will simply assign the number? (I assume that's the alternative) I imagine the problem is a case where you over-write, but that is reported and thus fine?
This makes me wonder: maybe we should trigger a backup of the old db on import so that we have a backup just in case?
Just noticed that the edges table lists source twice now, and the 2nd is empty.
How do we see / test this? Not sure how to go into "debug mode":
In debug mode, NodeTables show IDs. You can now sort by the ID.
Hey @benloh, @kalanicraig and I discussed and vote that the "new" requirement be relaxed so that nodes and edges can be imported as suggested. Our suggested modification is that if the highest id in use can be reported on the import / export tab, it'll help folks know what number range is safe to import without overwriting things. What do you think?
Sorry...every time I think I'm almost done, another variant comes up. I'm slowly chasing them down.
@jdanish Some replies:
- The error text on import appears correctly, but looks like this:
This is one of the things I fixed. Your original VFOI1.template.toml
file had a bad importIsLocked
message definition due to my buggy import prior to March 19th. This is why I asked you delete the toml. If you need to keep it, what you need to do is remove this from the end of the original toml file:
Lines 187-190
[importIsLockedMessage]
type = "string"
description = "Warning message to display if user is trying to edit a node, edge, or template import is active."
default = "Data is currently being imported, please try again later."
- When I tried hiding the info field, and then re-importing it...
Yeah, i think my current approach is probably bad. We're conflating hidden
/ hiding a field with determining whether or not it is required for export or import. It shouldn't erase the field from the actual database file, but it does hide it from all visible UI (e.g. node/edge editors and tables), and since during import the field is missing, the imported fields will have empty values for those fields.
I was trying to keep things simple (there are already so many damn parameters for each field), but I think I need to add a new parameter requiredForImportExport
flag to determine whether or not the data is exported and required for import, separate from hidden
, which is more UI oriented.
And then we probably also need to be careful when importing to make sure that we don't clobber existing values if the field is not required for export.
- Just a note that the video isn't a video, but an image linked in using the markdown.
I was going off the "description". This is definitely something to watch out for, since you could end up with a multiple gigabyte image. I'm not sure what effect it will have on the db overall, and I haven't done super careful testing of our import/export parsing to make sure all encoded binary data is kosher. You may very well end up with issues due to quote or carriage return encoding.
I didn't realize that the markdown editor will do the encoding for you -- I had assumed you were using links to external sites.
- imported the node it didn't update the "updated" field to the current day / time
This is a function of the loki database. If the imported field has updated
data, then we simply use that data. We don't do a separate update to match the date/time of the import.
But loki is also a little weird. So for instance, I believe even if you don't import "Last Updated", loki will not record an update timestamp if it is inserting a record for the first time. So a new record will never have an updated value, nor a revision value. If you import an existing record that has a "Last Updated" value in the import data, loki will just use that imported value. Theoretically if you import an existing record without the "Last Updated", loki ought to add an updated value timestamped with the import. But I haven't tested this carefully to make sure that's the case.
If we separate the 'hidden' from 'requiredForImportExport' then maybe we can handle this situation...keeping in mind that unless we override loki's default 'updated' behavior, you will never get an 'updated' value with the initial record creation.
- Testing 5 helped me notice this that the nodes table appears to be displaying the created date not the last updated under updated
This is due to the loki behavior I noted above. 'updated' isn't set with the initial field creation, or if we're importing the "Last Updated" field. That's why NodeTable falls back to use the creation date if revision < 1.
- It seems like subsequent imports do not re-read the file...
Actually this should have been one of the many many UI issues I had carefully fixed. If this is still happening, can you please provide me with detailed example files and steps to reproduce? Ideally in a different issue.
- Re: Ben's prior comment. I agree, we want to be able to easily import nodes and edges....
Yeah the previous behavior was that I would blindly insert new node/edges with the new ids. And yes, if it replaced an existing node you would get a message. The danger here was that the internal counter that kept track of the next available ID was not being updated. We'll just have to add that.
I think the issue here is as much social engineering/sound data processing practices as anything.
maybe we should trigger a backup of the old db on import so that we have a backup just in case?
Most definitively.
the edges table lists source twice now, and the 2nd is empty.
This is your template. You have citations
field using a displayLabel of Source
.
In debug mode, NodeTables show IDs. You can now sort by the ID. / How do we see / test this? Not sure how to go into "debug mode":
This is only for developers. In NodeTable.jsx
line 28, change it to:
var DBG = true;
highest id in use can be reported on the import / export tab,
Sure that's worth a try.
moved
Re the template: OK, I was able to fix that easily, thanks (number 2).
Re: number 3, OK if we can fit it in? If not I think it's fine as-is. Alternatively, if we want to go the other direction, we could just re-label it as used / not used instead of hidden. I don't think we are typically using hidden to temporarily hide something, but really just to remove it from a specific network use-case. I'm fine with either. @kalanicraig ?
Re 4: I hadn't realized either - I just searched and copied the url without looking at it. Easy enough to handle socially, though.
Re 5: I think this is fine as-is. Mostly noting it for me / Kalani / history. For now.
Re 6: This doesn't happen until an import, though. So are you saying we can set the created date but updated isn't getting set? I'm a bit confused?
Re 7: OK will try in a minute.
Re: the rest, cool. Thanks.
Re 6: This doesn't happen until an import, though. So are you saying we can set the created date but updated isn't getting set? I'm a bit confused?
I believe in this case what is happening is that the revision
field was not defined in your template (probably due to earlier template conversions not accounting for a number of fields, including revision
). Because revision data is missing, the NodeTable finds no revisions and falls back to using the creation field, even though LastUpdated is available.
So you are actually updating both the creation and last update field, but the table is falling back on creation because revision is missing. It's a little confusing because of all the interconnected pieces. Here's what you have to do:
Revision
field and add 1
to each record.This is an example of the type of semi-intelligent workarounds I've already had to build in all over the place as we did imports. I'm not gonna be able to do a workaround this round, but this is the type of thing we're going to want to improve as we play with the import workflow. I expect there are a lot more issues like this.
Seems like creating new templates with the current code are key to my errors. Will try that and check back in a bit…
(from my iPhone)
Joshua Danish http://www.joshuadanish.com
On Mar 21, 2022, at 5:26 PM, benloh @.***> wrote:
Re 6: This doesn't happen until an import, though. So are you saying we can set the created date but updated isn't getting set? I'm a bit confused?
I believe in this case what is happening is that the revision field was not defined in your template (probably due to earlier template conversions not accounting for a number of fields, including revision). Because revision data is missing, the NodeTable finds no revisions and falls back to using the creation field, even though LastUpdated is available.
So you are actually updating both the creation and last update field, but the table is falling back on creation because revision is missing. It's a little confusing because of all the interconnected pieces. Here's what you have to do:
Edit the exported nodes file: Add a Revision field and add 1 to each record. Edit the Current Template a. Click on "Object Properties" next to "nodeDefs" b. Check the "revision" box to add it to the template c. Save the template Re-import the nodes file. This is an example of the type of semi-intelligent workarounds I've already had to build in all over the place as we did imports. I'm not gonna be able to do a workaround this round, but this is the type of thing we're going to want to improve as we play with the import workflow. I expect there are a lot more issues like this.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
@jdanish @kalanicraig I've completely rewritten the import UI and validation. Now as soon as you select a file, we do both header checking and id validation and report the results immediately. This has a number of implications:
Please give it a whirl! Hopefully it's an improvement.
Initial reaction is "sweet!!"
Will bang on it, but looks good to me so far!!
EdgeTable sorting was broken due to NCDATA changes. Sorting on all fields should now work again.
_default.template.toml
with template-schema.js
-- auto-generate a template-schema.template.toml
file on build? => netcreateorg/netcreate-2018#227 hidden
checkbox when template editingnew
ID to generate a new node or edgeupdated
and revision
are not quite working as expected for both node and edges -- disconnect between FILTEREDD3DATA and NCDATA?created
, updated
, and revision
data for import and export.requiredForImportExport
flag~ Moved to netcreateorg/netcreate-itest#32Edge filtering functionality is now restored. NCDATA changes resulted in filters operating on ids rather than labels.
QA
requiredForImportExport
will be implemented next round -- implementing it properly gets very complex as we have to deal with pre-existing fields, field removal vs hiding, and arbitrary field import.Awesome. Just to be really clear, the comment about "requiredForImportExport" means that in the current version (until we get more funding), if you hide a field via template, it will not export or import. So, anything you want in the graph should be listed as visible for that process.
I think that's fine and mostly users treat them as identical for now, just want to make sure we know. Thanks!
Yeah. I had started to implement it last night, but as I sketched things out, I realized it was WAY to complicated if we want to properly handle all the cases. See netcreateorg/netcreate-itest#32
Think of "hidden" as a way to temporarily turn off fields that you might want to restore later. If you don't need a field at all, you can just not include them in the template.
Kalani wrote: I’ve now tested import/export, filtering, standalone and template changes on 4-5 different networks, including a Japanese-language network and a network with markup in the notes, and not had problems with any of them. I’d guess there are still some bugs floating around, but it’s probably time to merge into dev and also maybe to release nc-multiplex.
Branch:
dev-bl/import
Overview
This adds an Import Data feature to Net.Create.
This is a pretty significant feature that ended up touching ALL aspects of the app. A very thorough QA cycle will be needed.
To Test
To test, first export nodes and edges from an existing graph. (NOTE please create new exports. Do not use exported csv files created previously from the
dev-bl/export
branch, there were errors in the previous export routine.)git fetch && git checkout dev-bl/import
./nc.js --dataset=yourdataset
Next, try modifying the labels of the exported nodes, and reimport them.
Next, create a new graph and try importing the nodes and edges.
"new"
ctrl-c
to quit Net.Create../nc.js --dataset=empty
Other things to test:
?admin=true
. You should be able to import data.allowLoggedInUserToImport
to true. Open Net.Create remotely without admin priviledges. Import is disabled. Log in. Import is now enabled.Test error checking:
Import Feature
Net.Create can import nodes and edges via separate
.csv
files.The easiest way to set up a CSV file for import is to first export a few nodes/edges from your existing project. They key is to set up the Template first with the appropriate headers. Then you can export and modify the csv files, then reimport them.
For both nodes and edges, you should be able to:
Replacing Existing Nodes and Edges
During an import, Net.Create uses node and edge ids to match imported data to existing data.
If the id does not match an existing node or edge id, the app will output an error message listing the problem id and the row of the id. "row" refers to the line number in the csv file. Line 1 is the header. Line 2 would be the first data row.
Adding New Nodes/Edges
To add a new node or edge, you need to use an id of "new". For example, to add "Tacitus" and "Granicus", you would define an import csv file with two rows where the ID is set to "new".
import_node.csv
You can mix "new" and replacements, e.g. this will replace existing node 1 with "Claudius", and add "Tacitus" and "Granicus".
The "new" keyword is case-insensitive. e.g. "NEW", "New", "new", and "nEw" all work.
Defining Import Fields
The fields required by the graph for import are defined by the template. Currently any fields that are defined in the template and NOT HIDDEN will be required when you import. E.g. if you define 6 fields in the template, then the import file MUST have 6 fields with headers that match the
exportLabel
fields defined in the template, or the importer will complain about missing fields.Any fields that are marked in the template as hidden will not be exported, nor will they be required for import.
While all non-hidden headers are required in the csv file, you can skip fields in the node/edge data rows. For example, when importing a node, you can just specify "id" and "label" and skip the other fields, e.g.:
It's possible we might want to relax this and just require only the main fields (e.g. id and label with nodes, source target and id with edges). This will give you more flexibility when importing. On the other hand, you can also leave the fields empty, so long as you have the right headers in the csv.
The fields required in the import/export headers are defined via the
exportLabel
property in templates. TheexportLabel
will map the headers to built-in Net.Create fields.You can rename
exportLabel
fields to match your the fields required/used by your external graph application. For example, if your graph application expectstype
to be labeled asOPTIONS
, you can set the nodetype
exportLabel toOPTIONS
. The labels are case sensitive.Encoding / Special Characters
Since we're using
.csv
as the export/import data format, there are a few considerations for encoding:There are probably other exceptions that we'll need to add validation for, especially control characters.
Error Checking
The app provides two levels of validation during import. When you select a file for import, the system will:
Errors Caught:
id
Errors Not Caught:
row
number that an error is reported on can be thrown off if there are carriage returns in quoted text.Troubleshooting
Oftentimes, bad encoding errors (e.g. mismatched double quotes, extraneous commas, extraneous carriage returns) will result in a Header error. If you see an error about mismatched headers and you know your headers are...
Import Report
After importing data, the app will display a list of nodes/edges that have been replaced as well as a count of all the nodes/edges that were added or replaced.
Import Permissions
Roles: Admins vs Non-admins
Admins are always allowed to import data.
Non-admin users are normally not allowed to import data. This pull request adds a new option to the Templates to allow logged in users to import data. If the template setting
allowLoggedInUserToImport
is set totrue
, then logged in users will be able to import data. By defaultallowLoggedInUserToImport
isfalse
.Edit States
Since importing modifies the database, during an import, editing the Template and editing individual Nodes and Edges is locked out. Conversely, if someone is editing a Node, Edge, or Template, Importing is locked out. This prevents accidental overwriting of data.
If you navigate away from the Import panel, the import is cancelled. This might be a little surprising and awkward so we might want to revisit this. But there wasn't another clear way to "Cancel" the import lockout.
Standalone Mode
Importing is also disabled in standalone mode, since you are not allowed to modify the database.
Import Backups
Every time you click the "Import" button to import data (nodes or edges), the Net.Create server will make a backup of the current database into the
runtime/backups
folder before executing the import. The backup file will be named the same as the open database file with a timestamp appended.If you are running this on a server or using
nc-multiplex
you'll want to periodically monitor theruntime/backups
folder to make sure it does not grow too large. You'll want to periodically clear out the backup files.If you need to restore a backup, you can copy it to
runtime/
and just open it directly via./nc.js --dataset=xxx
call, or rename the database file back to the original name, copy it over the original inruntime/
and open that via./nc.js
.Admin Tools
"Force Unlock All"
Template editing, importing data, and node and edge editing are all mutually exclusive actions: If someone on the network is doing one of those activities, others are prevented from doing the same. (The one exception is that if you are editing a node or edge, others are only prevented from editing the same node or edge and editing the template or importing, but they can still edit other nodes and edges.) When the edit/import is complete the lock on editing should be released.
Every once in a while, the release message is lost and the edit lock remains in place. If this happens, an administrator can go to the More > Import / Export panel and click the "Force Unlock All" button to release the edit lock and re-enable template editing, importing, and node and edge editing.
WARNING: Use this with utmost caution! If someone is actively editing or importing, you can delete their work, or even worse, corrupt the database!
Other Changes
revision
update bugThere was a bug where the
revision
field was either not updated at all, or was being updated only once per session (after a reload), instead of being updated with every database update. It now properly updates every time you edit and save a node or edge.TOML template file update
The default toml template and schema have been updated. You'll want to review or update your existing template files.
"weight"-ready
While the UI (EdgeEditor) does not currently support it, the app logic now calculates edge line sizes by summing up edge
weight
values. e.g. if two nodes are connected by three edges with a weight of 1, 2, and 4, the size of the edge will be 6.weight
defaults to 1. It can support values smaller than 1.weight
is not currently settable via the EdgeEditor UI. EdgeTable does not yet displayweight
either. It is also not saved in the database.Optimization
We've done a little bit of optimizing the d3 render loop -- node sizes are now calculated before the render and done more efficiently.
Data Model Refinement
This is mostly under the hood stuff, but we have now more formally separated the raw network data from the rendered d3 data. This should make it easier to do future updates.
Force Updates
You might notice graphs look very different. With the refinement of the data model, we updated the way d3 is rendering forces. Hopefully this is an improvement, but we will probably have to do some exploration to make sure all graphs look better.
Node Table ID Sorting
In debug mode, NodeTables show IDs. You can now sort by the ID.