security-force-monitor / sfm-cms

Platform for sharing complex information about security forces. Powers WhoWasInCommand.com
https://whowasincommand.com
10 stars 3 forks source link

Designing a record deletion processing #420

Closed evz closed 5 years ago

evz commented 6 years ago

I was struck by a little Friday afternoon existentialism which I thought I'd immortalize in an issue.

As I'm working out the finer points of how we might edit the relationship between two organizations, I'm noticing that the wireframes Tom put together seem to hint at the possibility of reversing the relationship between two organizations (so from parent to child and vice versa). We don't really attach sources to the relationship, per se, but we do attach sources to the parent entity and the child entity within that relationship. Should those always be the same? It would seem to logically follow that the evidence that leads you to believe that one thing is the parent of another thing could be used as evidence of the reverse also being true. I guess what I'm mostly struggling with in the UI is a way to ultimately end up with sources being assigned to both parent and child separately and it suddenly occurred to me that I might not need to.

Taking that a step farther, what does it mean to "reverse" a relationship? If the above is not true (that the sources in one direction confirm the other direction), should we also reverse the sources?

Actually, when it comes right down to it, this throws the whole concept of "editing" on this site into question. If you're changing a fact and adding a source, what happens to the other sources that maybe contradict the change you just made?

I mean, what is truth?

tonysecurityforcemonitor commented 6 years ago

@evz I'm always happy when our project pushes the existential questions ;) - I'm glad you raised this because I can see how thorny it could become. I'm going to try and change these double constrictor knots into Gordian knots... and cut them (end of knot related analogies).

Actually, when it comes right down to it, this throws the whole concept of "editing" on this site into question. If you're changing a fact and adding a source, what happens to the other sources that maybe contradict the change you just made?

Good question. Those other sources should either 1) also be removed during the editing process or 2) may not be implicated in the change so they would stay. In general our edits will be focused on a particular field for an entity (i.e. 'org:AOO') rather than across multiple entities. In rare cases where we need to edit across multiple entities, we still generally will be editing the same field across multiple entries. This process will be greatly aided by the ability to see everything a source references in the database.

As I'm working out the finer points of how we might edit the relationship between two organizations, I'm noticing that the wireframes Tom put together seem to hint at the possibility of reversing the relationship between two organizations (so from parent to child and vice versa). We don't really attach sources to the relationship, per se, but we do attach sources to the parent entity and the child entity within that relationship. Should those always be the same? It would seem to logically follow that the evidence that leads you to believe that one thing is the parent of another thing could be used as evidence of the reverse also being true. I guess what I'm mostly struggling with in the UI is a way to ultimately end up with sources being assigned to both parent and child separately and it suddenly occurred to me that I might not need to.

Taking that a step farther, what does it mean to "reverse" a relationship? If the above is not true (that the sources in one direction confirm the other direction), should we also reverse the sources?

Ah this is really helpful to have spelled out, since it operates a bit differently on the spreadsheet where we are only explicitly sourcing the parent entity (and implicitly sourcing the child entity). Yes, the sources for parent and child entity should always be the same (side note its bit a of a change for me to think that way, but it makes sense methodologically). Not sure if that makes the UI issue easier... I hope so!

evz commented 6 years ago

@tonysecurityforcemonitor Yes, knowing that logic prevails when defining a relationship between parent and child makes things quite a bit easier. Thanks for indulging my Friday afternoon navel gazing!

tlongers commented 6 years ago

Okay, so this philosophical thread happens when I'm on leave - not fair! Only I'm allowed to navel gaze on a Friday okay? :)

Anyhow, I'm going to re-open this briefly because of #451 - we don't ever need to reverse the relationship between orgs. What we do need to be able to do is add a child to a parent, and add a parent to a child from within their respective contexts. As Tony says, it's the same field being edited, using the same source, just activated through two different paths.

But this issue also touches on the matter of deleting data, which takes us baaaaackk iiiinnnn tiimmmmeee to one of our earliest issues (#36). I'd welcome a discussion about what options there are for deleting specific data whilst keeping integrity. For example, if a specific source is shown to be spurious:

I'm not sure what kind of workflow would easily do all of this. Thoughts @evz @tonysecurityforcemonitor ?

tlongers commented 6 years ago

I remember that I did a note on data integrity systems (shared earlier with the dev team) to assist with this very discussion.

tlongers commented 5 years ago

A simple and effective way to remove records is needed. GIving this priority as part of Iteration 1.

jeancochrane commented 5 years ago

From reading these materials, it seems like our needs are:

  1. In the immediate term, we need a way to delete sources and access points so that we can do basic cleanup in the data entry system (it's possible that this may extend to other entity types as well, but I only see #576 and #577 in the backlog for now)
  2. In the long term, we need an end-to-end system for handling the lifecycle of all types of entities, including deletion (see https://github.com/security-force-monitor/sfm-cms/wiki/Note:-Data-integrity-systems-for-WhoWasInCommand.com)

One way to think about merging these two streams would be to identify the portions of 2) above that can be adapted for the purposes of 1). Then, as development continues, we can refactor our solution for 1) to align it with the full system as designed in 2).

To me, starting with the "hard delete" functionality seems to make the most sense for tackling 1), since hard deletes require fewer conditional views than soft deletes. I imagine a workflow for deleting sources and access points to look something like this:

  1. If a source or access point provides no evidence for other records, it can be hard deleted immediately.
  2. If a source or access point does provide evidence for other records, instruct the user to remove it from the records it references first before allowing it to be hard deleted.
    • This raises the question: What to do about records for which the source or access point in question is the only piece of supporting evidence? In these cases, we have two options for how we can proceed:
      1. Allow the user to remove the source or access point anyway, and add a "citation needed" flag to it (n.b this may or may not be feasible given the way that complex_fields is written -- it's been long enough that I can't remember off the top of my head).
      2. Require that the user replace the source or access point with another source or access point. This is the simpler solution, but potentially more cumbersome for SFM staff.

Once we start implementing the full unpublish/trash/hard-delete workflow outlined in https://github.com/security-force-monitor/sfm-cms/wiki/Note:-Data-integrity-systems-for-WhoWasInCommand.com, it would be a smaller step to refactor the UX above to accommodate the new workflow.

How does this plan sound? Am I missing something important? Still getting back into the swing of the data model here, so let me know if I've left something out.

tonysecurityforcemonitor commented 5 years ago

Hi @jeancochrane - jumping in to comment:

Require that the user replace the source or access point with another source or access point. This is the simpler solution, but potentially more cumbersome for SFM staff.

This sounds like the right approach to me - as long as it is possible to change the related field as well.

For example: the source "Army Parade Day" is evidence that the 1 Battalion is based in the city of Abuja and gives us a date_last_cited for that site of 1 October 2016. Later we realize we got it wrong and have to remove that source entirely from the entry. Generally, this would mean either: 1) the site of Abuja is also removed as "Army Parade Day" was the only source for that site, and the related date fields 2) other sources evidence a site of Abuja for 1 Battalion and we have to update the date_last_cited based on one of them. In this case let's say the source "New Commander Gives Speech" gives us a date_last_cited of 12 March 2015. So we remove "Army Parade Day", replace it with "New Commander Gives Speech" and update the date_last_cited from 1 October 2016 to 12 March 2015.

Does that workflow make sense?

tonysecurityforcemonitor commented 5 years ago

Relatedly how I imagine it working is when deleting a source it flags all the fields that source touches which would allow the user to see where the changes are simple (just removing it from a site for an org where other sources give date first and last cited), to more complex cases such as the example above.

jeancochrane commented 5 years ago

Does that workflow make sense?

Yeah, that makes sense to me! So it sounds like supporting the hard deletion of sources/access points does necessarily imply being able to delete other entity types as well (in order to e.g. remove Abuja as a site since "Army Parade Day" was unreliable). I'll do a little more thinking and see if I can come up with some recommended next steps for that functionality.

Relatedly how I imagine it working is when deleting a source it flags all the fields that source touches...

Do you think the interface as described in https://github.com/security-force-monitor/sfm-cms/issues/392 will be enough to identify these fields, or do we need a specific "deletion" view with more information?

tonysecurityforcemonitor commented 5 years ago

Yeah, that makes sense to me! So it sounds like supporting the hard deletion of sources/access points does necessarily imply being able to delete other entity types as well (in order to e.g. remove Abuja as a site since "Army Parade Day" was unreliable). I'll do a little more thinking and see if I can come up with some recommended next steps for that functionality.

Sounds good!

Do you think the interface as described in #392 will be enough to identify these fields, or do we need a specific "deletion" view with more information?

I think that should work - since you really just need to see the fields it touches. That would include the related date fields as well right? If so that sould work then.

jeancochrane commented 5 years ago

Thinking about this some more today. I'm on the fence about whether it's a better idea to implement hard deletes or soft deletes first. I definitely think we should start with one, and only move on to the other as part of an "improvement" to the system.

Soft deletes are definitely more complicated than hard deletes, but we expect them to make up the bulk of delete operations in the application. At a bare minimum, soft deletes will involve:

In contrast, hard deletes involve far fewer custom views/logic than soft deletes, since we can leverage the fact that records will be fully removed from the (latest version of) the database and search index. Still, we expect hard deletes to be less common than soft deletes, and so it's a little counterintuitive to start with them. At a bare minimum, hard deletes will involve:

Based on these rough specs, I actually think it makes the most sense to start by implementing hard deletes for all top-level entities. (Relationship entities like Compositions and MembershipPersons already have hard deletes implemented, which is another point in favor of starting with hard deletes.) Once we have hard deletes in place and working well, we can easily build on top of it to begin implementing a full "trash" lifecycle for all entities.

Curious to hear what others think! Particularly if I've missed any essential operations in the rough specs above, or if my suggested starting point won't capture all our needs for v1 of the data capture system.

tonysecurityforcemonitor commented 5 years ago

@jeancochrane - thanks for laying that all out - I agree going the hard delete option first makes the most sense.

When users try to delete a top-level entity, show them a list of all relationships that need to either be deleted or adjusted, and prevent deletion until user has emptied out the list

This will be an incredibly useful feature and prevent a lot of accidents in my mind.

I'm very intrigued by the workflow and potential utility of the soft delete functionality you laid out, but that does also strike me as an improvement rather than an immediate need.

jeancochrane commented 5 years ago

Great, thank you @tonysecurityforcemonitor! We'll get started on hard deletes now and return to the soft delete workflow during our improvements iteration.

tlongers commented 5 years ago

@jeancochrane The hard delete functionality works as described across all the permutations I tested. It removed deleted records from the database, search index and downloads.

Some notes:

jeancochrane commented 5 years ago

The warning text could be improved to reflect exactly what the user needs to do. Currently, it is "Deletion is disabled until all related entities are deleted". Changing it to "You can't delete this entity until you have removed its connections to the below entities" would make it more accurate.

I like that, sounds much clearer. I'll go ahead and make this change and let you know when it's up.

A deleted record can be recovered successfully through ~/admin, but the restored record will not reappear in the search index. I guess there is no mechanism for this to happen as yet.

Interesting! I hadn't even realized you could recover it through the Django admin interface, is that using the version history? If it's alright with you, I think it'd be easiest to push off the search index integration to the enhancement phase where we flesh out the full lifecycle of records, and just assume that recovering deleting records isn't supported yet. (In an emergency, we can bulk recreate the search index using the make_search_index management command anyway.)

tlongers commented 5 years ago

Interesting! I hadn't even realized you could recover it through the Django admin interface, is that using the version history? If it's alright with you, I think it'd be easiest to push off the search index integration to the enhancement phase where we flesh out the full lifecycle of records, and just assume that recovering deleting records isn't supported yet. (In an emergency, we can bulk recreate the search index using the make_search_index management command anyway.)

Good question. I'm not sure what underpins this recovery function - it's part of the tools available to the administrator through the GUI e.g. ~/en/admin/organization/organizationname/recover/. It doesn't show a history, only the list of records that were removed. This seems to be a useful function to keep around as a last resort, and we don't need to augment it any way. As you say we can rebuild the search index anyhow if we need to.

jeancochrane commented 5 years ago

Cool, that makes sense to me 👍 I made the text adjustments to the deletion warning, so I'm dropping this back in the QA hopper.

tlongers commented 5 years ago

Checked it appears on deletion screens for all entities. Looks good to me