scitran / core

RESTful API
https://scitran.github.io
MIT License
18 stars 18 forks source link

Delete functionality - phase 1 (tagging) #1030

Closed ambrussimon closed 6 years ago

ambrussimon commented 6 years ago

Resolves #1012

Minor TODOs left, but bulk is ready for 1st pass/quick review for discussion.

TODO/TBD:

Review Checklist

codecov-io commented 6 years ago

Codecov Report

Merging #1030 into master will increase coverage by 0.4%. The diff coverage is 93.97%.

@@           Coverage Diff            @@
##           master   #1030     +/-   ##
========================================
+ Coverage    90.7%   91.1%   +0.4%     
========================================
  Files          50      50             
  Lines        6864    6892     +28     
========================================
+ Hits         6226    6279     +53     
+ Misses        638     613     -25
nagem commented 6 years ago

TBD containers getting the delete tag don't apply it on their files - no explicit need though.

I don't see a problem until we move them to their own collection. Search will properly remove them.

nagem commented 6 years ago

in hindsight, would we rather want {'deleted': {'$ne': True}}

I think we would remove the key if we "un-delete" something, but curious if anyone else has an opinion on the matter.

ambrussimon commented 6 years ago

@nagem Thanks for all the help and the input. Implemented and tested all features requested for the tagging phase (including permission handling and analysis ref checks). Sticking with {'exists': ...} because it's not just binary, I'm actually setting it to a datetime.

Also freshly rebased.

ambrussimon commented 6 years ago

@nagem @ryansanford Rebased, patched, ready for review.

nagem commented 6 years ago

Changes look good, holding on merge for Ryan's review of "restore" script.

ambrussimon commented 6 years ago

@ryansanford added undelete.py and manually (and very limitedly) tested that container undel/recursion/file undel work.

nagem commented 6 years ago

I'm going to add a new access log type (delete_container) and add it where appropriate here.

ryansanford commented 6 years ago

I suggest undelete.py recursively undelete parent containers if they are deleted. Currently it does not, resulting in the restored container referencing parent containers that are not accessible by the API.

Scenario...

  1. User deletes project, which recursively deletes all nested sessions and acquisitions.
  2. User needs a couple specific sessions or acquisitions restored.
ryansanford commented 6 years ago

Another scenario that may trump the one I just posted above, or make the approach to it more nuanced...

  1. A week ago user deletes a single acquisition within a project.
  2. Yesterday a user deletes that project by mistake, and wants that project restored.

I think it's reasonable for a user to expect that restoring a project is scoped to the single delete event in no.2, and would not end up restoring the acquisition deleted in no.1.

Undoubtably, it's a powerful capability to make it easy to indiscriminately undelete everything under a project. I'm questioning if that fits the administrative use case we were initially targeting.

@tcbtcb @lmperry I think your perspectives are important here.

ryansanford commented 6 years ago

Trying to undelete a file that exists, but not currently deleted is resulting in an exception. Instead, exit code should be zero with no error message, since the state of the system matches the intended state of running undelete.py.

2017-12-21 15:42:45 scitran.undelete              undelete.py    46:INFO Removing "deleted" tag from file acquisitions/5a39375e7053b30020c9e2f7/4374_1_1b.nii.gz...
Traceback (most recent call last):
  File "bin/undelete.py", line 59, in <module>
    main()
  File "bin/undelete.py", line 49, in main
    del f['deleted']
KeyError: 'deleted'
ryansanford commented 6 years ago

Scitran Collections, nor their attached files can be undeleted with the current implementation.

ryansanford commented 6 years ago

I'm done with my review.

PSA: Search indexing implementations should be updated as appropriate before this change sees production.

nagem commented 6 years ago

@ryansanford if we end up needing to support that scenario (files deleted in the past don't get restored), I have a few ideas we could use (timestamp set once for all containers, delete timestamp not overwritten).

ryansanford commented 6 years ago

@ambrussimon I spoke with Thad, and he does prefer the option to make the undelete act on only the containers that were part of the delete in question. A blindly recursive undelete that I initially suggested is not desired.

ambrussimon commented 6 years ago

Phew, I addressed everything discussed, @ryansanford.

nagem commented 6 years ago

The additional logging changes/tests are added.

nagem commented 6 years ago

I wanted to mention something Ambrus and I discussed offline:

Although collections can be "recovered" by the script, only the collection and it's attachments (files, notes, etc) will be recovered. Sessions and acquisitions will need to be re-added manually. 1) Files, info and notes are the only unique data in a collection, so recovering these items via the collection would be the only way to recover them in case of an unintentional deletion. 2) There are technically restraints in how we reference which acquisitions/sessions belong to a collection that would make it complicated to preserve this in a deleted state. Since no unique data is lost in this kind of deletion, we deemed it unnecessary to support this kind of recovery action.

nagem commented 6 years ago

2.) When deleting files within any container, the access log should not have the file name url encoded so it can be easily used to source the undelete command.

I'll tackle this now by adding the (unencoded) filename to the access log.

ambrussimon commented 6 years ago

@ryansanford Good catch. cont_name pluralization at fault. Added test revealed that analysis deletion (propagation) and undeletion scope was off - fixed. (+as always, fresh rebase)

nagem commented 6 years ago

@ryansanford Additional file context on access log added.