Jeff-Thompson12 commented 1 year ago

Create a helper function(s) to update users databases with current improvements to the application. This function should take into account the state of the application upon major release (i.e. master). If the user has been using an implementation of the application from master, they should be able to update their database using this function when a new push to master occurs.

AARON-CLARK commented 1 year ago

I've been thinking about this off and on for a while. What's tricky is that the proposed function would have to interface with any version of the app that hits master. For example, let's say there are 10 versions of the app, and 5 have hit master with "breaking changes" to the database. The function would need to know the differences between each version, and which versions have changed the database in a "breaking" manner, requiring an update:

Let's say versions 1, 3, 5, 6, & 10 had the breaking changes. The function would have to know how to update the database going from:

1 -> 3, 1 -> 5, 1 -> 6, 1 -> 10
3 -> 5, 3 -> 6, 3 -> 10
5 -> 6, 5 -> 10
6 -> 10

That's (x-1)!, where x = number of versions with breaking db changes. Not to mention, if someone is on version 7, there would need to be logic that says if the user is on versions 6 - 9 should follow the same process as 6 -> 10. And should it be backward compatible? What if I'm on version 10, but need to go back to version 9? Should we help users make revert back to the previous db schema?

That's a lot of complexity. What did you have in mind?

Jeff-Thompson12 commented 1 year ago

My thoughts:

I don't think we need to make the function backwards compatible. I think that it's purpose should only be to bring past versions up to date. This isn't "pick which deployment you want", it's "integrate new features".
We don't need to go back to the creation of the application. I personally think we can just work off of the current master deployment. (Although if we want to go back one more to hedge our bets, we can do that too.)
I think we build a repository of database configurations corresponding to deployments of master and the function has to bring each of them up to the current deployment. This can then be built into our tests so individual PRs that change the database will need to resolve the issues before being merged.

The addition of customizations we have on dev right now does make this complex, but I don't think we should pass this complexity off to the user. I think this function is important in encouraging users to use our application.

AARON-CLARK commented 1 year ago

There's not doubt that this function is needed (💯%). I was just curious if you had some secret plan up your sleeve that hadn't occurred to me yet. But anyway:

I would think being able to go back to a previous riskassessment version is a common use case we shouldn't rule out just yet. But maybe we don't work on it until someone asks for it?
Though my example displayed this, I wasn't advocating to start back at app inception. However, starting at our next release, and looking forward a couple release cycles, the layout of the land may start to resemble the example above. Thus, we shouldn't always assume users are on the previous app version. They may need to jump forward from a much older release. So, starting at next release, our database will need a version identifier added to compare against the version of the app that's running.
Great plan!

Jeff-Thompson12 commented 1 year ago

@AARON-CLARK I'm less convinced this is a good idea (at least the idea of migrating the old database in its entirety). A handful of planned enhancements would create non-backwards compatible features.

I do think that we could create a function that would do the following:

[ ] Initialize a database based on the settings of the previous.
[ ] Prepopulate the database based on the package list of the previous.
[ ] Transfer comments

Am I missing anything?

AARON-CLARK commented 1 year ago

Hmmm, would the proposed function maintain the old assessments or would each pkg get re-assessed? That's the main action we have to try to avoid.

Jeff-Thompson12 commented 1 year ago

The proposed function would re-assess the packages. The problem is that the application does not store riskmetric references or assessments (although PR #537 would start storing the actual assessments). We have no way to reproduce past events in the released version.

Jeff-Thompson12 commented 1 year ago

@AARON-CLARK Another thing not being discussed here is that we added user roles, which is actually an update to the credentials database.

Eduardodudu commented 1 year ago

Hello!

I've been looking at the comments and maybe dm package can be helpful in setting up the DB versioning updates.

I feel this could be helpful because:

Checks the schema of the pointed DB
Helps in populating the data for another schema of the DB

Eduardodudu commented 1 year ago

I created a small snippet to illustrate the idea:

The idea is to have a function for each package DB schema release and add this as a dm object. Then the user can run the script that checks the version of his database package (maybe a new entity?) and the version of the database, the script will run each step to update his database to the new schema

For a scenario where the data is already there and just normalized it seems to be handled by this approach, but for scenarios where the entity data is new, then it’ll need to allow the user to edit the data

Code snippet

library(dm)

path <- "database.sqlite"

con <- DBI::dbConnect(RSQLite::SQLite(), path)
dm_data <- dm_from_con(con) 

# Set schema --------------------------------------------------------------

# Set the schema again since sqlite does not create table for schema reference such as postgresql
dm_schema <- dm_data %>% 
  dm_add_pk(package, id) %>%
  dm_add_pk(package_metrics, id) %>%
  dm_add_pk(metric, id) %>%
  dm_add_pk(decision_categories, id) %>%
  dm_add_pk(comments, id) %>%
  dm_add_fk(community_usage_metrics, id, package, name) %>%
  dm_add_fk(package, id, package_metrics, package_id) %>%
  dm_add_fk(decision_categories, id, package, decision_id) %>%
  dm_add_fk(metric, id, package_metrics, metric_id)

dm_draw(dm_schema, view_type = "all")

#In general ok
dm_examine_cardinalities(dm_schema)
dm_examine_constraints(dm_schema)

# First release update ----------------------------------------------------

#First release - adding a relation between comments and package reference
updated_schema <- dm_schema %>%
  dm_zoom_to(comments) %>%
  mutate(package_id = NA) %>% 
  dm_update_zoomed() %>% 
  dm_add_fk(comments, package_id, package, id)

dm_draw(updated_schema, view_type = "all")

# Second Release update ---------------------------------------------------

# Second release - Add users and roles and connect to commetns

# New tables to add
users <- tibble(
  user_id = NA,
  user_name = NA
)

roles <- tibble(
  role_id = NA,
  role_name = NA
)

roles_users <- tibble(
  id = NA,
  user_id = NA,
  user_role = NA
)

#deconstruct legacy schema
tables_schema <- dm_get_tables(updated_schema, keyed = TRUE)

#update new schema with legacy
last_schema <- new_dm(
    c(tables_schema, list(users = users, roles = roles, roles_users = roles_users))
  ) %>% 
  dm_add_pk(users, user_id) %>% 
  dm_add_pk(roles, role_id) %>% 
  dm_add_pk(roles_users, id) %>% 
  dm_add_fk(roles_users, user_id, users) %>% 
  dm_add_fk(roles_users, user_role, roles) %>% 
  # normalize comments
  dm_zoom_to(comments) %>% 
  mutate(user_id = NA) %>% 
  dm_update_zoomed() %>% 
  dm_add_fk(comments, user_id, roles_users)

dm_draw(last_schema, view_type = "all")

#hard part now to update data of new schema

pharmaR / riskassessment

Update database.sqlite Function #504

Code snippet

Images reference

set schema

First release - adding a relation between comments and package reference

Second release - Add users and roles and connect to comments