Closed nagem closed 6 years ago
Merging #1047 into master will not change coverage. The diff coverage is
100%
.
@@ Coverage Diff @@
## master #1047 +/- ##
======================================
Coverage 90.7% 90.7%
======================================
Files 50 50
Lines 6864 6864
======================================
Hits 6226 6226
Misses 638 638
Question: if the database upgrade fixes all missing keys, why also change the field access to tolerate them missing? Shouldn't we only need one or the other?
I want to understand why "code should not be written expecting these keys to exist" if going forward they always will...
It's true that we should only logically need one or the other. Unfortunately we don't specify anywhere that the keys should exist, and I am not 100% confident that all active users are only adding data through API endpoints. Assuming keys important to the structure of the data are present (a file's name
and size
, a session's project
foreign key) is an acceptable assumption but periphery that is often null like tags
and info
is harder to expect to be present.
Why add a db update at all then? Although I'm not confident enough in this update being a "final fix" to allow a potential 500
key access error in auto-generated situations (rule eval) to continue existing, I am confident it will fix the existing old "bad" data and that the potential for new bad data is very small. We can reevaluate what part of the data model should always be present at a later time but for now, I don't believe the documentation has been clear enough to support direct key access on commonly null fields.
To add additional information, files are the only object in the system that set nulls for their allowable fields. I think this is a practice that could be useful for other objects but it should be something we standardize rather than having to remember you can (in clean data sets) directly access file keys but not others.
Makes sense. Would you support changing our key access back to assuming these fields exist once this is more documented - say, with the coming Swagger docs?
If the root cause of any future missing keys is bad code or bad (direct) access, then it would seem like a good idea to keep surfacing the problems that causes.
We also have a collection of "integrity checks" and one could easily be written to test if the the problem is still around after this update. If the failure was more along the lines of "you can't view a file that is malformed" I would be more willing to allow strict key access and use the failures as an alert, but not with an opaque error in job creation/rule evaluation that a user wouldn't have good insight into.*
*Edit: Or might not ever see, in the case of rule eval for engine uploads
It sounds like the real problem is not so much that we fail, rather that we fail silently. Given that operational problem, it makes sense to be overly conservative here.
We should solve the silent part, and that'll free us up for better choices in the future.
Fixes #1042
Updates files that might have missing keys* as well, but code should not be written expecting these keys to exist.
* A null version of these keys are always added to new files via
upload.py
. Files from old version of scitran or that were manually entered into the system might be missing these keys.Breaking Changes
None
Review Checklist