Closed karlcz closed 7 years ago
I think this needs more in-depth use case and roadmap discussion before anything can be planned.
There are serious decisions to make about the intersection of this feature with fine-grained security, if this log content is going to be exposed to anybody but server admins.
Also, if there is any expectation to ask questions about the history of specific records, you quickly stray into temporal DB territory with all the implications for scaling and the meaning of history that spans across model versions.
Lets discuss. We have to have some mechanism to understand how people are using the editing features within a data model.
Carl
Dr. Carl Kesselman Dean’s Professor, Epstein Department of Industrial and Systems Engineering Fellow, Information Sciences Institute Viterbi School of Engineering
Professor, Preventive Medicine Keck School of Medicine
University of Southern California 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292-6695 Phone: +1 (310) 448-9338 Email: carl@isi.edumailto:carl@isi.edu Web: http://www.isi.edu/~carl
From: Karl Czajkowski notifications@github.com Reply-To: informatics-isi-edu/ermrest reply@reply.github.com Date: Friday, May 19, 2017 at 6:16 PM To: informatics-isi-edu/ermrest ermrest@noreply.github.com Cc: Carl Kesselman carl@isi.edu, Mention mention@noreply.github.com Subject: Re: [informatics-isi-edu/ermrest] Improve logging for entity updates (#146)
I think this needs more in-depth use case and roadmap discussion before anything can be planned.
There are serious decisions to make about the intersection of this feature with fine-grained security, if this log content is going to be exposed to anybody but server admins.
Also, if there is any expectation to ask questions about the history of specific records, you quickly stray into temporal DB territory with all the implications for scaling and the meaning of history that spans across model versions.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/informatics-isi-edu/ermrest/issues/146#issuecomment-302841704, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADbjXqyxiRnbomOjIeccBCrg9aFVmfCkks5r7j8LgaJpZM4NCCnz. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/informatics-isi-edu/ermrest","title":"informatics-isi-edu/ermrest","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/informatics-isi-edu/ermrest"}},"updates":{"snippets":[{"icon":"PERSON","message":"@karlcz in #146: I think this needs more in-depth use case and roadmap discussion before anything can be planned.\r\n\r\nThere are serious decisions to make about the intersection of this feature with fine-grained security, if this log content is going to be exposed to anybody but server admins.\r\n\r\nAlso, if there is any expectation to ask questions about the history of specific records, you quickly stray into temporal DB territory with all the implications for scaling and the meaning of history that spans across model versions."}],"action":{"name":"View Issue","url":"https://github.com/informatics-isi-edu/ermrest/issues/146#issuecomment-302841704"}}}
Now that fine grain access control is "done." can we revisit this issue and the best approach to move forward?
Further discussion has raised the need for something more like a temporal/history access interface, which is quite different from audit/logging.
If a history mechanism would allow structured access to previous versions of records, etc, is there still a need to add detailed logging results from bulk operations? It seems to me that one could instead query into the history system if one needs that level of detail, so we don't necessarily want or need to replicate that level of detail into two stores?
NOTE: while this branch is in development, multiple internal refactoring/restructuring tasks will be done. No attempt at backward-compatibility between commits will be attempted. Only upgrade from the previous master all the way to the final branch state will be supported. Hence, any pilot user must be prepared to drop and reload entire catalogs after any incremental commits in the branch.
This branch now has read-only access to previous versions of the catalog for both schema and data retrieval. In general, any existing retrieval API like:
GET /ermrest/catalog/N/...
now has a complementary history API like:
GET /ermrest/catalog/N@revision/...
where revision is a URL-encoded timestamp identifying a transaction that created the referenced catalog state. For convenience, imprecise revision timestamps will be matched to the most recent revision which occurred at or before the requested timestamp.
Perhaps at this point we can try working out a use case which is doing an RDA compliant identifier to a data collection.
Carl
Dr. Carl Kesselman Dean’s Professor, Epstein Department of Industrial and Systems Engineering Fellow, Information Sciences Institute Viterbi School of Engineering
Professor, Preventive Medicine Keck School of Medicine
University of Southern California 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292-6695 Phone: +1 (310) 448-9338 Email: carl@isi.edumailto:carl@isi.edu Web: http://www.isi.edu/~carl
On Sep 1, 2017, at 3:16 PM, Karl Czajkowski notifications@github.com<mailto:notifications@github.com> wrote:
This branch now has read-only access to previous versions of the catalog for both schema and data retrieval. In general, any existing retrieval API like:
GET /ermrest/catalog/N/...
now has a complementary history API like:
GET /ermrest/catalog/N@revision/...
where revision is a URL-encoded timestamp identifying a transaction that created the referenced catalog state. For convenience, imprecise revision timestamps will be matched to the most recent revision which occurred at or before the requested timestamp.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/informatics-isi-edu/ermrest/issues/146#issuecomment-326695183, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADbjXjvR-b6lM0TDnKELweRjTPQXjAC7ks5seIJTgaJpZM4NCCnz.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/informatics-isi-edu/ermrest","title":"informatics-isi-edu/ermrest","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/informatics-isi-edu/ermrest"}},"updates":{"snippets":[{"icon":"PERSON","message":"@karlcz in #146: This branch now has read-only access to previous versions of the catalog for both schema and data retrieval. In general, any existing retrieval API like:\r\n\r\n GET /ermrest/catalog/N/...\r\n\r\nnow has a complementary history API like:\r\n\r\n GET /ermrest/catalog/N@revision/...\r\n\r\nwhere revision is a URL-encoded timestamp identifying a transaction that created the referenced catalog state. For convenience, imprecise revision timestamps will be matched to the most recent revision which occurred at or before the requested timestamp."}],"action":{"name":"View Issue","url":"https://github.com/informatics-isi-edu/ermrest/issues/146#issuecomment-326695183"}}}
For the history amendment feature, which I think is a blocking feature for a production MVP, we need to think about the authorization model.
I think it will need a new access mode like amend
which can appear in rights summaries and is distinct from normal update
, delete
, and insert
rights which apply to the latest evolving catalog.
I also think we can limit ourselves to these amendment features:
NULL
I think we should disallow model changes in history amendment. If someone has a need to redact sensitive info that was encoded into the model structure itself, I think they have no recourse but to ETL sanitized content into a new catalog and destroy the old one.
For an initial MVP, I'd like to grant this amend
right only to owners. But, do we need fine-grained amendment rights, i.e. you can tamper with history in one schema or table but not others? Or should we just grant it to the overall catalog owners who have holistic responsibilities for the content?
@carlkesselman @robes
The history amendment apis are prototyped now but the test suite isn't covering them yet.
There are now basic test cases for history apis. I think this feature is ready for more testing and review, but the corresponding PR will be broadened with some other closely related changes before it is ready to merge...
This issue was originally about logging of entity updates. It has since expanded scope to history capture for several related use cases. It is an umbrella task that will include quite a few related development activities...
Use Cases to Consider
Challenges
The main challenge for history capture in ERMrest revolves around its generic/introspective nature. We cannot assume that the model (i.e. the SQL DDL) is constant throughout the history. Our clients and our protocol depend on an understanding of the model that governs the data being queried or exchanged. Thus, we will have to capture a history of model changes as well as data changes.
Closely related to the model is fine-grained authorization policy and model annotations. We also need to be able to capture these and serve them up with the historical model and data, applying the appropriate history-relevant policy when deciding access rights on historical data. However, it is possible that policies for a project change, including retroactive changes in access rights to past data. This could be due to legal/human issues or simply to correct a technical flaw in a previously deployed policy. Because policies are tightly coupled to the model, it is not sufficient to just "use the latest policy". Rather, there needs to be a way to amend the policy that will be applied to a historical model, as an orthogonal problem to amending the latest policy that will be applied to the latest model!
Similarly, data redaction or history pruning involves amending historical data. The goal in both cases is to purge data from storage so that it is no longer retrievable and no longer resident in storage resources. Thus, such amendments are destructive and would obviously not support further "undo" or history-of-history tracking. An out-of-band recovery scheme would be needed to reverse such destructive changes, just as it is required now to reverse all changes in the absence of history capture. New web APIs probably need to be defined to allow amendment of historical content.
Technical Features and Tasks
Selective OID-relable mechanism to replace individual tables or columnsAPI extension for entity history access, i.e. polymorphic, longitudinal representationsAPI extension for audit access, i.e. event-stream representationsThe API extensions with
strikethroughabove are considered out of scope for an MVP release and should be added as later enhancements.