Feature Request: Database Import and Export from Electron App

theRAPTLab / meme

DEPRECATED as of 2024-05-21. Use `theRAPTLab/meme-2023` instead. MEME development framework, using Electron, Webpack, and Express to create an "appliance-style" app server to end users on a LAN. Migrated from GitLab June 2023.

MIT License

0 stars 2 forks source link

Feature Request: Database Import and Export from Electron App #23

Closed benloh closed 1 year ago

benloh commented 4 years ago

The Rutgers team needs to be able to load arbitrary database files from a running Electron app.

There are a few primary use models:

They need to be able to run an Electron app that can load the interim database files from the periodic saves they had made during their study.
They need load older database files for analysis. This potentially includes printing the data, which will be new functionality added to the Electron app.
The other implication for research is that the data is loaded read-only. You really don't want to modify the data.
For future studies, they need to be able to export the database files from the running Electron app. The current functionality with daily backups whenever the app runs helps, but researcher probably also need to be able to request a backup at any point in time.

What makes this complicated?

This needs to be run from the Electron app, not directly from node/npm. So accessing the file system is potentially an issue.
The periodic "saves" are actually zips of the Electron app as a whole. So someone either needs to manually extract the database files from these archives (the likely solution), or we'd need to write a script that can open an Electron app and copy the database files out of them.
The version of the app that Rutgers ran during their study was not the 1.0 final release, but probably the 1.0 beta release, built some time around 11/7/2019. We should review any issues that might arise from trying to load data from a pre-1.0 version.
There is definitely a data incompatibility between version 1.0 and version 1.1. See !83 and !81. So we will likely need to figure out how to handle loading old data and gracefully dealing with errors.

benloh commented 4 years ago

In GitLab by @daveseah on Apr 20, 2020, 11:21

FOLLOWUP:

FOR EXTRACTION:

Note to self: Rutgers bases their backups around copying the ENTIRE APP, so the ideal solution treats these apps as the archive files that have data to be extracted using ANOTHER app (presumably, another Electron app)
Or, Ben suggests implementing drag and drop on the current app to take advantage of the existing code / simplicity gained from reducing the need for multiple apps. THEORY: The browser/Chromium should be able to emit anything generated by the Electron app for download like any website. We did some quick tests in console-main.js and console.js to see if we could insert links, but couldn't figure out the location of assets right away.

FOR IMPORT:

Drag a DATABASE file (loki? json? something?) onto the electron app, and it copies it to to the internal runtime area in a "read only" mode and initializes the app read-only, disabling saves to the active database.

benloh commented 4 years ago

In GitLab by @daveseah on Apr 30, 2020, 11:06

Just to be sure I understand the INTENT and MOTIVATION:

This is for use during review of past data as part of their research analysis. They want to load past dataset and browse them. The intent is NOT for actual trials?
The application must be in a "Read only" or Review Mode as to not disrupt the database. An additional implication is that the current dataset is untouched by this mode; researchers may want to review past data between live sessions. Currently there are backups of the current dataset.
A slightly different capability is to export the current dataset as a snapshot that can be re-imported later through this mechanism. Possibly as both a restore and review mode.

Additionally:

This is an opportunity to add a versioning scheme to the database format and the Electron app that is displayed on the MEME screen.
This might be a good chance to look more into code-signing for Catalina issues too.

APPROACH

research how drag-and-drop works on Electron apps (v3 docs)
research how Electron apps access the file system (may require code-signing updates) (use app methods, fs, and dialog)
research a means of versioning the MEME system such that (1) the Electron app or the Node server can access it and (2) the database format has its own separate version number that is manually incremented.

Thoughts on Versioning

There's a package called semantic-versioning which is probably overkill for us. Other packages will read package.json and insert the version from there, but there is a danger of including your entire package.json file in your release (with obvious security implications).

The approach I'm thinking is to use some kind of hook to run the meme utility to pull the branch and commit information as well as the build date and write it into a module file. We already pull this information when running the server; by writing a file, we automatically have something that can be committed.

We want to distinguish between development builds and master releases somehow. The Electron app should display a warning prominently when running a non-master release.

On npm run dev or npm run electron before webpack runs:

collect current version number from package.json
collect branch and commit hash from git
collect current date/time
collect database version from database module
write to version.js somewhere
start webpack and continue as normal

Important: The database version is manually updated in the database module.

Thoughts on Database Migration

The database loader will check for some kind of version metadata. Each database version will have to include a module that migrates the old structure to the new one, one version skip at a time.

Database versioning should be maintained as sequential integers.

When the server initializes, it checks its internal stored database version (written in version.js) against the database loaded metadata. If it encounters a change, it then loops through every version increment from the starting to ending points by loading a particular module written expressly to handle it. This is done by passing the entire database object to each dynamically loaded module until the database version is current.

benloh commented 4 years ago

Related issue re versioning: #27

Database Migration

A few more thoughts:

We can assume migration goes one-way: You can update the db file to the modern code-base, but not downgrade it.
Django (or maybe it's python) has an interesting migration module. I'm not sure how they do it, but occasionally you are asked to run a migration script if it detects things are out of date.

benloh commented 4 years ago

In GitLab by @jdanish on Apr 30, 2020, 12:15

Great questions! Mostly, we would love to have our cake and eat it to when it comes to the benefits of using a file-based systems instead of a centralized database :)

Some of the current use cases:

1) after data collection, it’d be nice to be able to backup that day’s data easily without needing to backup the entire app 2) it’d be nice to be able to look at the data from a given day without having to download the entire app or manage multiple apps 3) It’d be nice for folks who are not technical be able to do both of the above quickly and easily 4) it’d be nice to be able to pass data amongst each other easily a) this could be for analysis such as I send today's database to NJ for them to look at b) this could be for preparing a day. For example, in our last run, we created a group that was the research team and used that to demo ideas. So I pre-added a comment to the model for the research team to show the students how to read / add a comment. Or added a new entity to be able to demo what we had in mind. This is also the kind of thing that might be done offline, and by different people. That is, maybe I do it at home and need to add it to the lab server, or Morgan does it and needs to send it to me. 5) Something gets screwed up and we want to roll back a version

I know much of this is handled via the snapshots, but it’s easier for the non-tech-savvy members of the team if we could use separate files. And also conserves space.

Now to your prompts below:

Dave.Sri Seah https://gitlab.com/dasri commented:

Just to be sure I understand the INTENT and MOTIVATION:

This is for use during review of past data as part of their research analysis. They want to load past dataset and browse them. The intent is NOT for actual trials? As noted above, it is both since we are sometimes modifying data in-between sessions and then want to be able to talk about it in the trial. The application must be in a "Read only" or Review Mode as to not disrupt the database. An additional implication is that the current dataset is untouched by this mode; researchers may want to review past data between live sessions. Currently there are backups of the current dataset. I think we could handle this by never working with the original. That might be far easier than creating a “read only” mode and enabling it. A slightly different capability is to export the current dataset as a snapshot that can be re-imported later through this mechanism. Possibly as both a restore and review mode.

Seems like we are going this way?

Additionally:

This is an opportunity to add a versioning scheme to the database format and the Electron app that is displayed on the MEME screen. Cool. This might be a good chance to look more into code-signing for Catalina issues too. Cool, though since this would be handled primarily by the research team for now, it is fine if it uses the Chrome plugin which already handles some of this kind of activity, no? Honestly, until we can easily add and remove resources this way as well, I am not sure it matters if we can sign the app. It’d be nice to sign it and be able to send it around, but that is not a burning need. Though a caveat there is that many of these same requirements appear in GEM-STEP and in a different way in Net.Create, so if solutions help all of those, that’d be awesome. And I’d be OK with them eating up budget in those accordingly if that feels legit to you. (and pending approval of the appropriate leads)

benloh commented 4 years ago

In GitLab by @jdanish on Apr 30, 2020, 12:16

Yes, I assume we can assume migration goes one way. in fact, if we can easily import old data, the only reason we’d ever run the old app again is if we need screenshots of the version that something was generated in for an article. So I’d keep old builds around for that, but that’s rare.

benloh commented 4 years ago

In GitLab by @daveseah on Apr 30, 2020, 12:44

@jdanish when you mentioned the Chrome Plugin, this made me consider that "Database" might refer to not just the .loki database, but the ENTIRE DATASET including all assets, files, links, etc as a snapshot?

benloh commented 4 years ago

In GitLab by @jdanish on Apr 30, 2020, 12:48

@daveseah Well... good question. It probably should include all because if we rename a resource link, things will no-longer make sense, but then again we don't want to overwrite that stuff unless we have a backup! Certainly, it would make life easier if we could edit those things, send them to someone, and then load them. But if that is a much larger project, then assuming they aren't changing works short-term. Clearly we need a follow-up grant! :)

benloh commented 4 years ago

In GitLab by @jdanish on Apr 30, 2020, 13:18

@daveseah thinking about this some more, I think the ideal be that if the DLC folder were external (and in theory could be scattered all over), then the "dump" would include a link to the resource links externally. So if I want to send my build to you I'd need to send my data file + a zip of the DLC folder. If I don't, you can read everything and see what students did, but wouldn't see the actual resources, or the screen captures. But that'd work for 95% of the things we'd do and then that 5% of the time we'd send / backup the DLC folder. I think that makes more sense long-term then bundling everything into some proprietary package of stuff, though a zip of it all wouldn't be that proprietary/big. But we change the DLC far less often than the models, so separating the two makes sense. And with more than one classroom, but sharing a DLC, it really would be easier to export each classroom data and have one DLC folder to copy. Just thinking aloud ....

benloh commented 4 years ago

Joshua adds

So far, we’ve been mostly talking about exporting / importing “the whole thing” which consists of: 1) teachers / classrooms / settings, 2) models, 3) resources with one brief reference to the idea that maybe the resources should be separate.

I think that for our current SEEDS plans, this works fine. At most, we anticipate having 2 teachers at a given location, maybe 3. So grouping it all together is fine and if anything convenient. The one case where we might ant to share a “portion” of that data cross-site is when we create a sample model to show the other location or in another classroom. For example, in IN we made an extra group for the IU researchers and updated that model whenever we demonstrated new features, sometimes adding things in-between sessions so that we could show how we handled it. For example, adding a new entity to then show how evidence might link to it. With so few classrooms, these likely diverge, and if they don’t it is easy to re-create the model by hand in the meme interface quite quickly, especially because we can print it out and copy the text if needed. In the long-term future, that might change where we’d want to be able to share a model made at IU individually with Rutgers and vice versa. So mostly I’d say wait on worrying about it, but in case it helps in thinking about architecture I figured I’d mention it.

On Net.Create, we sort-of have this option already since we can simply copy the loki and template file over to a new install. Long-term, we’ll want to be able to send a loki and template easily to a front-end user and have them import it, but it seems that is easier on the data side. The challenge there is more tied to running multiple visualizations / groups at once.

On GEM-STEP, I think we will absolutely need / want to be able to choose to either export “all the stuff” for archival purposes and initial setup, or export a single model for sharing across sites or classrooms, possibly importing it under a new name (we might, after all, have multiple copies of something called “fish model” and want to differentiate them when uploading them). Like any programming style environment, being able to share files / code seems like it likely has real value there. Of course, if push comes to shove we’d likely prioritize that below getting the scripting and other things working well, etc. But I think we’ll need it eventually.

benloh commented 4 years ago

In GitLab by @daveseah on May 15, 2020, 11:48

MAY 15 2020 REVIEW WITH BEN

Scoping down again, the immediate need is:

Click a button to export an "archive file" as a snapshot of the current running database
Drag a saved "archive file" back onto Electron to restore a "temporary read-only mode" so researchers can analyze the dataset.

This intent is to "save snapshots" and "restore snapshots" for a particular server (MEME.app) instance, WITHOUT portability between server instances (see #31 for that).

From earlier message:

research how drag-and-drop works on Electron apps (v3 docs)
research how Electron apps access the file system (may require code-signing updates) (use app methods, fs, and dialog)
research a means of versioning the MEME system such that (1) the Electron app or the Node server can access it and (2) the database format has its own separate version number that is manually incremented. (note: this is database versioning only, which is not the same as the visible appl version issue in #27)

benloh commented 4 years ago

In GitLab by @jdanish on May 15, 2020, 12:01

@daveseah @benloh a few quick clarifications:

It doesn't have to be drag and drop. Clicking a button or similar is fine.
It doesn't need to be read only, and in fact might be bad if it is because we might want to basically switch to an old version if something goes wonky? Or if we can still do that by opening the package I guess that's fine? (Ideal might be a choice - do you want to replace the database with this new one, or just view).
Obviously we'd need to be really careful, though the snapshots you are creating protect us, right?
Re:portability - not sure what you mean. We might want to send our data to Rutgers so they can read it / use it. But they wouldn't need to merge it with theirs (though the model export import would help there).

Joshua

benloh commented 4 years ago

In GitLab by @daveseah on May 15, 2020, 12:30

RE: versioning the database.

Q. Accessing a DB Version Number?
A. I think we use an integer and manually increment it when we change the database code. These changes happen relatively infrequently compared to repo changes, so we need something that's easy to "diff" as a monotonically increasing integer (e.g. 1 -> 2, not 1 -> 3 or 1 -> 1a). ALSO, we need to write this version number into the database as a new key or db property (not sure what Loki supports).

Q. What database formats already exist prior to this work?
A. There are three already:

0 had broken comments because Rutgers used the wrong app (very out of date pre-release).
1 the one currently in the system (pre-March 2020) So if we don't see a db version key in the database, we will have to do "inspect-based version guessing", or maybe write a "database integrity" filter and optimistically label it "version 1" afterwards, reporting any errors. Need to research/review the specific issue with the version 0 database to write the inspection-based version method.
2 is the version that Ben added "why", "outcomes" db changes

Q. How to migrate DB versions?
A. We can define database migration functions that know what changed between increment versions and do any appropriate processing. For example:

const updateFuncs = {
"1-2": (db)=>{ ... stuff that updates, returning modified db or error},
"2-3": (db)=>{ ... more stuff that updates, returning modified db or error),
}
function CheckUpdate(version) {
  if (version < CURRENT_VERSION) {
    // loop through each updateFunc until all updated (a "migration")
  }
  // write the updated database file if successful
  // complain bitterly if not
}

benloh commented 4 years ago

In GitLab by @daveseah on May 15, 2020, 12:52

mentioned in commit 78a4a9d52337c2d38ee107ae937b9ce93f61374f

benloh commented 4 years ago

In GitLab by @daveseah on May 15, 2020, 12:57

@jdanish

Regarding (1) - we'll focused on using the easiest quickest method

Regarding (2) - snapshots as implementing here will have to be for a specific MEME.app instance. We're avoiding doing the entire kitchen sink now just to be sure we can get the basics working.

Regarding (3) - "maybe"...this is a QA issue to see where any additional failure points are. I suspect there are expectations and use cases that will emerge once this is out in the wild.

Regarding (4) - another "maybe"...we started another issue for this so we don't pile on too many tasks in a single issue.

benloh commented 4 years ago

In GitLab by @jdanish on May 15, 2020, 14:08

Gotcha. OK!

benloh commented 4 years ago

Marking this completed for now. Re-open if necessary.

benloh commented 4 years ago

closed